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ABSTRACT 



The present invention provides polynucleotide sequences of the genome of 
Staphylococcus aureus, polypeptide sequences encoded by the polynucleotide 
sequences, corresponding polynucleotides and polypeptides, vectors and hosts 
comprising the polynucleotides, and assays and other uses thereof. The present 
invention further provides polynucleotide and polypeptide sequence information stored 
on computer readable media, and computer-based systems and methods which 
facilitate its use. 
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1 What Is Claimed Is: 

2 

3 1. Computer readable medium having recorded thereon the nucleotide 

4 sequence depicted in SEQ ID NOS: 1-5, 191, a representative fragment thereof or a 

5 nucleotide sequence at least 95% identical to a nucleotide sequence depicted in SEQ ID 

6 NOS:l-5,191. 

7 

8 2. Computer readable medium having recorded thereon any one of the 

9 fragments of SEQ ED NOS: 1-5,191 depicted in Tables 2 and 3 or a degenerate variant 
10 thereof. 

il 

12 3. The computer readable medium of claim 1, wherein said medium is selected 

13 from the group consisting of a floppy disc, a hard disc, random access memory 

14 (RAM), read only memory (ROM), and CD-ROM. 
15 

16 4. The computer readable medium of claim 3, wherein said medium is selected 

17 from the group consisting of a floppy disc, a hard disc, random access memory 

18 (RAM), read only memory (ROM), and CD-ROM. 
19 

20 5. A computer-based system for identifying fragments of the Staphylococcus 

2 1 aureus genome of commercial importance comprising the following elements: 

22 (a) a data storage means comprising the nucleotide sequence of SEQ ID 

23 NOS: 1-5,191, a representative fragment thereof, or a nucleotide sequence at least 95% 

24 identical to a nucleotide sequence of SEQ ED NOS : 1-5,191; 

25 (b) search means for comparing a target sequence to the nucleotide sequence of 

26 the data storage means of step (a) to identify homologous sequence(s), and 

27 (c) retrieval means for obtaining said homologous sequence(s) of step (b). 

28 

29 6. A method for identifying commercially important nucleic acid fragments of 

30 the Staphylococcus aureus genome comprising the step of comparing a database 

31 comprising the nucleotide, sequences depicted in SEQ ID NOS: 1-5,191, a 

32 representative fragment thereof, or a nucleotide sequence at least 95% identical to a 
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1 nucleotide sequence of SEQ ID NOS: 1-5,19 1 with a target sequence to obtain a nucleic 

2 acid molecule comprised of a complementary nucleotide sequence to said target 

3 sequence, wherein said target sequence is not randomly selected. 
4 

5 7. A method for identifying an expression modulating fragment of 

6 Staphylococcus aureus genome comprising the step of comparing a database 

7 comprising the nucleotide sequences depicted in SEQ ID NOS: 1-5, 191, a 

8 representative fragment thereof, or a nucleotide sequence at least 95% identical to the 

9 nucleotide sequence of SEQ ID NOS: 1-5, 19 1 with a target sequence to obtain a nucleic 

10 acid molecule comprised of a complementary nucleotide sequence to said target 

1 1 sequence, wherein said target sequence comprises sequences known to regulate gene 

12 expression. 
13 

14 8* An isolated protein-encoding nucleic acid fragment of the Staphylococcus 

15 aureus genome, wherein said fragment consists of the nucleotide sequence of any one 

16 of the fragments of SEQ ID NOS: 1-5,191 depicted in Tables 2 and 3, or a degenerate 

17 variant thereof. 
IS 

19 9. A vector comprising any one of the fragments of the Staphylococcus aureus 

20 genome SEQ ID NOS: 1-5, 191 depicted in Tables 2 and 3 or a degenerate variant 

21 thereof. 
22 

23 10. An isolated fragment of the Staphylococcus aureus genome, wherein said 

24 fragment modulates the expression of an operably linked open reading frame, wherein 

25 said fragment consists of the nucleotide sequence from about 10 to 200 bases in length 

26 which is 5' to any one of the open reading frames depicted in Tables 2 and 3 or a 

27 degenerate variant thereof. 
28 

29 11. A vector comprising any one of the fragments of the Staphylococcus 

30 aureus genome of claim 8, 
3i 



2.1 9441 1 



1 12. An organism which has been altered to contain any one of the fragments of 

2 the Staphylococcus aureus genome of claim 8. 

3 

4 13. An organism which has been altered to contain any one of the fragments of 

5 the Staphylococcus aureus genome of claim 10. 
6 

7 14. A method for regulating the expression of a nucleic acid molecule 

8 comprising the step of covalentiy attaching to said nucleic acid molecule a nucleic acid 

9 molecule consisting of the nucleotide sequence from about 10 to 100 bases 5' to any 

10 one of the fragments of the Staphylococcus aureus genome depicted in SEQ ED 

1 1 NOS: 1-5, 19 1 and Tables 2 and 3 or a degenerate variant thereof. 
12 

13 15. An isolated nucleic acid molecule encoding a homolog of any of the 

14 fragments of the Staphylococcus aureus genome of SEQ ID NOS : 1-5, 191 and Tables 

15 2 and 3„ wherein said nucleic acid molecule is produced by a process comprisng steps 

16 of: 

17 (a) screening a genomic DNA library using as a probe a target sequence 

18 defined by any of SEQ ID NOS: 1-5,191 and Tables 2 and 3, including fragments 

19 thereof; 

20 (b) identifying members of said library which contain sequences that hybridize 

21 to said target sequence; 

22 (c) isolating the nucleic acid molecules from said members identified in step 

23 (b). 
24 

25 16. An isolated DNA molecule encoding a homolog of any one of the 

26 fragments of the Staphylococcus aureus genome of SEQ ID NOS: 1*5, 191 and Tables 

27 2 and 3 S wherein said nucleic acid molecule is produced by a process comprising steps 

28 of: 

29 (a) isolating mRNA, DNA, or cDNA produced from an organism; 

30 (b) amplifying nucleic acid molecules whose nucleotide sequence is 

31 homologous to amplification primers derived from said fragment of said 

32 Staphylococcus aureus genome to prime said amplification; 
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1 (c) isolating said amplified sequences produced in step (b). 
2 

3 17. An isolated polypeptide encoded by any of the fragments of the 

4 Staphylococcus aureus genome of SEQ ID NOS: 1-5, 19 1 and depicted in Table 2 and 3 

5 or by a degenerate variant of said fragments. 

6 

7 18, An isolated polynucleotide molecule encoding any one of the polypeptides 

8 of claim 17. 
9 

10 19, An antibody which selectively binds to any one of the polypeptides of 

11 claim 17. 
12 

13 20. A kit for analyzing samples for the presence of polynucleotides derived 

14 from Staphylococcus aureus, comprising 

15 at least one polynucleotide containing a nucleotide sequence of any one of the 

16 fragments of SEQ ID NOS: 1-5,19 1 depicted in Tables 2 and 3 or a nucleotide 

17 sequence 95% identical thereto that will hybridize to a Staphylococcus aureus 

18 polynucleotide under stringent hybridization conditions, and 

1 9 a suitable container. 
20 

21 21, An isolated polypeptide comprising an amino acid sequence having at least 

22 95% identity to a Staphylococcus aureus polypeptide amino acid sequence selected 

23 from the group consisting of SEQ ID NOS:5,192 to 5,255. 
24 

25 22. The isolated polypeptide of Claim 21 wherein the isolated polypeptide 

26 comprises an amino acid sequence identical to that of a Staphylococcus aureus 

27 polypeptide selected from the group consisting of SEQ ID NOS:5,192 to 5,255. 
28 

29 23. An isolated Staphylococcus aureus polypeptide antigen comprising at least 

30 one epitope derived from a Staphylococcus aureus polypeptide selected from the group 

31 consisting of SEQ ID NOS:5,192 to 5,255, 
32 
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1 24, An isolated polypeptide comprising at least one epitope encoded by a 

2 Staphylococcus aureus amino acid seqeunce selected from the group consisting of the 

3 epitopic sequences listed in Table 4, 
4 

5 25. A polypeptide of Claim 23 wherein said polypeptide is fixed to a solid 

6 phase* 
7 

8 26. A diagnostic kit for detecting Staphylococcus aureus infection comprising 

9 (a) an isolated polypeptide antigen of Claim 23, and 

10 (b) means for detecting the binding of an antibody contained in a biological 

1 1 fluid to said antigen. 
12 

13 27. A vaccine composition comprising 

14 a polypeptide of Claim 23 present in a pharmaceutically acceptable carrier. 
15 

16 28. A method of vaccinating an individual against Staphylococcus aureus 

17 infection comprising, administering to an individual the vaccine composition of Claim 

18 27. 
19 

20 29. A method for producing a polypeptide in a host cell comprising the steps 

21 of: 

22 (a) incubating a host containing a heterologous nucleic acid molecule whose 

23 nucleotide sequence consists of any one of the fragments of the Staphylococcus aureus 

24 genome of SEQ ID NOS: 1-5,191 and depicted in Tables 2 and 3, under conditions 

25 where said heterologous nucleic acid molecule is expressed to produce said protein, 

26 and 

27 (b) isolating said protein. 
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Staphylococcus aureus Polynucleotides and Sequences 



FIELD OF THE INVENTION 

The present invention relates to the field of molecular biology. In particular, it 
relates to, among other things, nucleotide sequences of Staphylococcus aureus, 
contigs, ORFs, fragments, probes, primers and related polynucleotides thereof, 
peptides and polypeptides encoded by the sequences, and uses of the polynucleotides 
and sequences thereof, such as in fermentation, polypeptide production, assays and 
pharmaceutical development, among others. 



BACKGROUN D OF THE INVENTION 

The genus Staphylococcus includes at least 20 distinct species, (For a review 
see Novick, R. P M The Staphylococcus as a Molecular Genetic System, Chapter 1 , 
pgs. 1-37 in MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI, R. Novick, 
Ed., VCH Publishers, New York (1990)). Species differ from one another by 80% or 
more, by hybridization kinetics, whereas strains within a species are at least 90% 
identical by the same measure. 

The species Staphylococcus aureus, a gram-positive, facultatively aerobic, 
clump-forming cocci, is among the most important etiological agents of bacterial 
infection in humans, as discussed briefly below. 



Human Health and S* Aureus 

Staphylococcus aureus is a ubiquitous pathogen. (See, for instance, Mims et 
alt MEDICAL MICROBIOLOGY, Mosby-Year Book Europe Limited, London, UK 
(1993)). It is an etiological agent of a variety of conditions, ranging in severity from 
mild to fatal, A few of the more common conditions caused by S, aureus infection are 
burns, cellulitis, eyelid infections, food poisoning, j^int infections, neonatal 
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1 conjunctivitis,osteomyelitis, skin infections, surgical wound infection, scalded skin 

2 syndrome and toxic shock syndrome, some of which are described further below. 
3 

4 Burns 

5 Burn wounds generally are sterile initially. However, they generally 

6 compromise physical and immune barriers to infection, cause loss of fluid and 

7 electrolytes and result in local or general physiological dysfunction. After cooling, 

8 contact with viable bacteria results in mixed colonization at the injury site. Infection 

9 may be restricted to the non-viable debris on the burn surface ("eschar"), it may 

10 progress into full skin infection and invade viable tissue below the eschar and it may 

11 reach below the skin, enter the lymphatic and blood circulation and develop into 

12 septicaemia. S. aureus is among the most important pathogens typically found in burn 

13 wound infections. It can destroy granulation tissue and produce severe septicaemia; 
14 

15 Cellulitis 

16 Cellulitis, an acute infection of the skin that expands from a typically superficial 

17 origin to spread below the cutaneous layer, most commonly is caused by S. aureus in 

18 conjunction with S. pyrogenes. Cellulitis can lead to systemic infection. In fact, 

19 cellulitis can be one aspect of synergistic bacterial gangrene. This condition typically is 

20 caused by a mixture of 5. aureus and microaerophilic streptococci. It causes necrosis 

21 and treatment is limited to excision of the necrotic tissue. The condition often is fatal. 
22 

23 Eyelid infections 

24 S. aureus is the cause of styes and of sticky eye" in neonates, among other eye 

25 infections. Typically such infections are limited to the surface of the eye, and may 

26 occasionally penetrate the surface with more severe consequences. 

27 

28 Food poisoning 

29 Some strains of S. aureus produce one or more of five serologically distinct, 

30 heat and acid stable enterotoxins that are not destroyed by digestive process of the 

31 stomach and small intestine (enterotoxins A-E). Ingestion of the toxin, in sufficient 

32 quantities, typically results in severe vomiting, but not diarrhoea. The effect does not 
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1 require viable bacteria. Although the toxins are known, their mechanism of action is 

2 not understood. 
3 

4 Joint infections 

5 5. aureus infects bone joints causing diseases such osteomyelitis. 
6 

7 Osteomyelitis 

8 S. aureus is the most common causative agent of haematogenous osteomyelitis, 

9 The disease tends to occur in children and adolescents more than adults and it is 

10 associated with non-penetrating injuries to bones. Infection typically occurs in the 

11 long end of growing bone, hence its occurrence in physically immature populations. 

12 Most often, infection is localized in the vicinity of sprouting capillary loops adjacent to 

13 epiphysial growth plates in the end of long, growing bones. 
14 

15 Skin infections 

16 S. aureus is the most common pathogen of such minor skin infections as 

17 abscesses and boils. Such infections often are resolved by normal host response 

18 mechanisms, but they also can develop into severe internal infections. Recurrent 

19 infections of the nasal passages plague nasal carriers of S. aureus. 
20 

21 Surgical Wound Infections 

22 Surgical wounds often penetrate far into the body. Infection of such wound 

23 thus poses a grave risk to the patient, S. aureus is the most important causative agent 

24 of infections in surgical wounds. 5. aureus is unusually adept at invading surgical 

25 wounds; sutured wounds can be infected by far fewer 5. aureus cells then are 

26 necessary to cause infection in normal skin. Invasion of surgical wound can lead to 

27 severe S, aureus septicaemia. Invasion of the blood stream by S. aureus can lead to 

28 seeding and infection of internal organs, particularly heart valves and bone, causing 

29 systemic diseases, such as endocarditis and osteomyelitis, 
30 

31 
32 



219441 1 



1 Scalded Skin Syndrome 

2 5. aureus is responsible for "scalded skin syndrome" (also called toxic 

3 epidermal necrosis, Ritter's disease and Lyell's disease). This diseases occurs in older 

4 children, typically in outbreaks caused by flowering of S, aureus strains produce 

5 exfoliation(also called scalded skin syndrome toxin), Although the bacteria initially 

6 may infect only a minor lesion, the toxin destroys intercellular connections, spreads 

7 epidermal layers and allows the infection to penetrate the outer layer of the skin, 

8 producing the desquamation that typifies the diseases. Shedding of the outer layer of 

9 skin generally reveals normal skin below, but fluid lost in the process can produce 

10 severe injury in young children if it is not treated properly. 
11 

12 Toxic Shock Syndrome 

13 Toxic shock syndrome is caused by strains of S, aureus that produce the so- 

14 called toxic shock syndrome toxin* The disease can be caused by 5. aureus infection 

15 at any site, but it is too often erroneously viewed exclusively as a disease solely of 

16 women who use tampons. The disease involves toxaemia and septicaemia, and can be 

17 fatal. 
18 

19 Nocosomial Infections 

20 In the 1984 National Nocosomial Infection Sucyeillance Study ("NNIS") S. 

21 aureus was the most prevalent agent of surgical wound infections in many hospital 

22 services, including medicine, surgery, obstetrics, pediatrics and newborns. 

23 

24 Resistance to drugs ofS. aureus strains 

25 Prior to the introduction of penicillin the prognosis for patients seriously 

26 infected with S. aureus was unfavorable. Following the introduction of penicillin in 

27 the early 1940s even the worst 5. aureus infections generally could be treated 

28 successfully. The emergence of penicillin-resistant strains of S. aureus did not take 

29 long, however. Most strains of S. aureus encountered in hospital infections today do 

30 not respond to penicillin; although, fortunately, this is not the case for S, aureus 

31 encountered in community infections. 
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1 It is well known now that penicillin-resistant strains of 5. aureus produce a 

2 lactamase which converts penicillin to pencillinoic acid, and thereby destroys antibiotic 

3 activity. Furthermore, the lactamase gene often is propagated episomally, typically on 

4 a plasmid, and often is only one of several genes on an episomal element that, 

5 together, confer multidrug resistance. 

6 Methicillins, introduced in the 1960s, largely overcame the problem of 

7 penicillin resistance in S, aureus. These compounds conserve the portions of penicillin 

8 responsible for antibiotic activity and modify or alter other portions that make penicillin 

9 a good substrate for inactivating lactamases* However, methicillin resistance has 

10 emerged in S* aureus, along with resistance to many other antibiotics effective against 

1 1 this organism, including aminoglycosides, tetracycline, chloramphenicol, macrolides 

12 and lincosamides. In fact, methicillin-resistant strains of S* aureus generally are 

13 multiply drug resistant. 

14 The molecular genetics of most types of drug resistance in S. aureus has been 

15 elucidated (See Lyon et at., Microbiology Reviews 51: 88-134 (1987)). Generally, 

16 resistance is mediated by plasmids, as noted above regarding penicillin resistance; 

17 however, several stable forms of drug resistance have been observed that apparently 

1 8 involve integration of a resistance element into the S. aureus genome itself. 

19 Thus far each new antibiotic gives rise to resistance strains, stains emerge that 

20 are resistance to multiple drugs and increasingly persistent forms of resistance begin to 

21 emerge. Drug resistance of S. aureus infections already poses significant treatment 

22 difficulties, which are likely to get much worse unless new therapeutic agents are 

23 developed. 
24 

25 Molecular Genetics of Staphylococcus Aureus 

26 Despite its importance in, among other things, human disease, relatively little is 

27 known about the genome of this organism, 

28 Most genetic studies of S. aureus have been carried out using the the strain 

29 NCTC8325, which contains prophages psill, psil2 and psil3, and the UV-cured 

30 derivative of this strain, 8325-4 (also referred to as RN450), which is free of the 

31 prophages. 
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1 These studies revealed that the S. aureus genome, like that of other 

2 staphylococci, consists of one circular, covalently closed, double-stranded DNA and a 

3 collection of so-called variable accessory genetic elements, such as prophages, 

4 plasmids, transposons and the like. 

5 Physical characterization of the genome has not been carried out in any detail 

6 Pattee et al published a low resolution and incomplete genetic and physical map of the 

7 chromosome of 5. aureus strain NCTC 8325. (Pattee et al Genetic and Physical 

8 Mapping of Chromosome of Staphylococcus aureus NCTC 8325, Chapter 11, pgs. 

9 163-169 in.MOLECULAR BIOLOGY OF THE STAPHYLOCOCCI R.P. Novick, 

10 Ed., VCH Publishers, New York, (1990) The genetic map largely was produced by 

11 mapping insertions of Tn551 and Tn4001, which, respectively, confer erythromycin 

12 and gentamicin resistance, and by analysis of Smal-digested DNA by Pulsed Field Gel 

13 Electrophoresis ("PFGE"). 

14 The map was of low resolution; even estimating the physical size of the 

15 genome was difficult, according to the investigators. The size of the largest Smal 

16 chromosome fragment, for instance, was too large for accurate sizing by PFGE. To 

17 estimate its size, additional restriction sites had to be introduced into the chromosome 

18 using a transposon containing a Smal recognition sequence. 

19 In sum, most physical characteristics and almost all of the genes of 

20 Staphylococcus aureus are unknown. Among the few genes that have been identified, 

21 most have not been physically mapped or characterized in detail. Only a very few 

22 genes of this organism have been sequenced. (See, for instance Thornsberry, J. , 

23 Antimicrobial Chemotherapy 21 Su ppl C : 9-16 (1988), current versions of 

24 GENBANK and other nucleic acid databases, and references that relate to the genome 

25 of S, aureus such as those set out elsewhere herein.) 

26 It is clear that the etiology of diseases mediated or exacerbated by 5. aureus 

27 infection involves the programmed expression of S. aureus genes, and that 

28 characterizing the genes and their patterns of expression would add dramatically to our 

29 understanding of the organism and its host interactions. Knowledge of 5. aureus 

30 genes and genomic organization would dramatically improve understanding of disease 

31 etiology and lead to improved and new ways of preventing, ameliorating, arresting and 

32 reversing diseases. Moreover, characterized genes and genomic fragments of S. 
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1 aureus would provide reagents for, among other things, detecting, characterizing and 

2 controlling 5. aureus infections. There is a need therefore to characterize the genome 

3 of $. aureus and for polynucleotides and sequences of this organism. 
4 

5 SUMMARY OF THE INVENTION 

6 

7 The present invention is based on the sequencing of fragments of the 

8 Staphylococcus aureus genome. The primary nucleotide sequences which were 

9 generated are provided in SEQ ID NOS: 1-5,191. 

10 The present invention provides the nucleotide sequence of several thousand 

11 contigs of the Staphylococcus aureus genome, which are listed in tables below and set 

12 out in the Sequence Listing submitted herewith, and representative fragments thereof, 

13 in a form which can be readily used, analyzed, and interpreted by a skilled artisan. In 

14 one embodiment, the present invention is provided as contiguous strings of primary 

15 sequence information corresponding to the nucleotide sequences depicted in SEQ ID 

16 NOS:l-5,19L 

17 The present invention further provides nucleotide sequences which are at least 

18 95% identical to the nucleotide sequences of SEQ ED NOS:l-5,191. 

19 The nucleotide sequence of SEQ ID NOS: 1-5,191, a representative fragment 

20 thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide 

21 sequence of SEQ ID NOS: 1-5,191 may be provided in a variety of mediums to 

22 facilitate its use. In one application of this embodiment, the sequences of the present 

23 invention are recorded on computer readable media. Such media includes, but is not 

24 limited to:magnetic storage media, such as floppy discs, hard disc storage medium, 

25 and magnetic tape; optical storage media such as CD-ROM; electrical storage media 

26 such as RAM and ROM; and hybrids of these categories such as magnetic/optical 

27 storage media. 

28 The present invention further provides systems, particularly computer-based 

29 systems which contain the sequence information herein described stored in a data 

30 storage means. Such systems are designed to identify commercially important 

3 1 fragments of the Staphylococcus aureus genome. 
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1 Another embodiment of the present invention is directed to fragments of the 

2 Staphylococcus aureus genome having particular structural or functional attributes. 

3 Such fragments of the Staphylococcus aureus genome of the present invention include, 

4 but are not limited to, fragments which encode peptides, hereinafter referred to as open 

5 reading frames or ORFs," fragments which modulate the expression of an operably 

6 linked ORF, hereinafter referred to as expression modulating fragments or EMFs," 

7 and fragments which can be used to diagnose the presence of Staphylococcus aureus in 

8 a sample, hereinafter referred to as diagnostic fragments or "DFs." 

9 Each of the ORFs in fragments of the Staphylococcus aureus genome disclosed 

10 in Tables 1-3, and the EMFs found 5' to the ORFs, can be used in numerous ways as 

1 1 polynucleotide reagents. For instance, the sequences can be used as diagnostic probes 

12 or amplification primers for detecting or determining the presence of a specific microbe 

13 in a sample, to selectively control gene expression in a host and in the production of 

14 polypeptides, such as polypeptides encoded by ORFs of the present invention, 

15 particular those polypeptides that have a pharmacological activity. 

16 The present invention further includes recombinant constructs comprising one 

17 or more fragments of the Staphylococcus aureus genome of the present invention. The 

18 recombinant constructs of the present invention comprise vectors, such as a plasmid or 

19 viral vector, into which a fragment of the Staphylococcus aureus has been inserted. 

20 The present invention further provides host cells containing any of the isolated 

21 fragments of the Staphylococcus aureus genome of the present invention. The host 

22 cells can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic 

23 cell, such as a yeast cell, or a procaryotic cell such as a bacterial cell 

24 The present invention is further directed to isolated polypeptides and proteins 

25 encoded by ORFs of the present invention. A variety of methods, well known to those 

26 of skill in the art, routinely may be utilized to obtain any of the polypeptides and 

27 proteins of the present invention. For instance, polypeptides and proteins of the 

28 present invention having relatively short, simple amino acid sequences readily can be 

29 synthesized using commercially available automated peptide synthesizers. 

30 Polypeptides and proteins of the present invention also may be purified from bacterial 

31 cells which naturally produce the protein. Yet another alternative is to purify 
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1 polypeptide and proteins of the present invention can from cells which have been 

2 altered to express them. 

3 The invention further provides polypeptides comprising Staphylococcus aureus 

4 epitopes and vaccine compositions comprising such polypeptides. Also provided are 

5 methods for vacciniating an individual against Staphylococcus aureus infection. 

6 The invention further provides methods of obtaining homologs of the 

7 fragments of the Staphylococcus aureus genome of the present invention and 

8 homologs of the proteins encoded by the ORFs of the present invention. Specifically, 

9 by using the nucleotide and amino acid sequences disclosed herein as a probe or as 

10 primers, and techniques such as PGR cloning and colony/plaque hybridization, one 

11 skilled in the art can obtain homologs. 

12 The invention further provides antibodies which selectively bind polypeptides 

13 and proteins of the present invention, Such antibodies include both monoclonal and 

14 polyclonal antibodies. 

15 The invention further provides hybridomas which produce the above-described 

16 antibodies. A hybridoma is an immortalized cell line which is capable of secreting a 

1 7 specific monoclonal antibody. 

18 The present invention further provides methods of identifying test samples 

19 derived from cells which express one of the ORFs of the present invention, or a 

20 homolog thereof. Such methods comprise incubating a test sample with one or more 

21 of the antibodies of the present invention, or one or more of the Dfs or antigens of the 

22 present invention, under conditions which allow a skilled artisan to determine if the 

23 sample contains the ORF or product produced therefrom. 

24 In another embodiment of the present invention, kits are provided which 

25 contain the necessary reagents to carry out the above-described assays. 

26 Specifically, the invention provides a compartmentalized kit to receive, in close 

27 confinement, one or more containers which comprises: (a) a first container comprising 

28 one of the antibodies, antigens, or one of the DFs of the present invention; and (b) one 

29 or more other containers comprising one or more of the following;wash reagents, 

30 reagents capable of detecting presence of bound antibodies, antigens or hybridized 

31 DFs. 
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1 Using the isolated proteins of the present invention, the present invention 

2 further provides methods of obtaining and identifying agents capable of binding to a 

3 polypeptide or protein encoded by one of the ORFs of the present invention. 

4 Specifically, such agents include, as further described below, antibodies, peptides, 

5 carbohydrates, pharmaceutical agents and the like. Such methods comprise steps of: 

6 (a)contacting an agent with an isolated protein encoded by one of the ORFs of the 

7 present invention; and (b)determining whether the agent binds to said protein. 

8 The present genomic sequences of Staphylococcus aureus will be of great value 

9 to all laboratories working with this organism and for a variety of commercial 

10 purposes. Many fragments of the Staphylococcus aureus genome will be immediately 

1 1 identified by similarity searches against GenBank or protein databases and will be of 

12 immediate value to Staphylococcus aureus researchers and for immediate commercial 

13 value for the production of proteins or to control gene expression. 

14 The methodology and technology for elucidating extensive genomic sequences 

15 of bacterial and other genomes has and will greatly enhance the ability to analyze and 

16 understand chromosomal organization. In particular, sequenced contigs and genomes 

17 will provide the models for developing tools for the analysis of chromosome structure 

18 and function, including the ability to identify genes within large segments of genomic 

19 DNA, the structure, position, and spacing of regulatory elements, the identification of 

20 genes with potential industrial applications, and the ability to do comparative genomic 

21 and molecular phylogeny. 
22 

23 DESCRIPTIO N OF THE FIGURES 

24 

25 FIGURE 1 is a block diagram of a computer system (102) that can be used to 

26 implement computer-based systems of present invention. 

27 

28 FIGURE 2 is a schematic diagram depicting the data flow and computer 

29 programs used to collect, assemble, edit and annotate the contigs of the 

30 Staphylococcus aureus genome of the present invention. Both Macintosh and Unix 

31 platforms are used to handle the AB 373 and 377 sequence data files, largely as 

32 described in Kerlavage et al, Proceedings of the Twenty-Sixth Annual Hawaii 
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1 International Conference on System Sciences, 585, IEEE Computer Society Press, 

2 Washington D.C. (1993). Factura (AB) is a Macintosh program designed for 

3 automatic vector sequence removal and end-trimming of sequence files. The program 

4 Loadis runs on a Macintosh platform and parses the feature data extracted from the 

5 sequence files by Factura to the Unix based Staphylococcus aureus relational database. 

6 Assembly of contigs (and whole genome sequences) is accomplished by retrieving a 

7 specific set of sequence files and their associated features using extrseq, a Unix utility 

8 for retrieving sequences from an SQL database. The resulting sequence file is 

9 processed by seqJSlter to trim portions of the sequences with more than 2% 

10 ambiguous nucleotides. The sequence files were assembled using TIGR Assembler, 

1 1 an assembly engine designed at The Institute for Genomic Research ( TIGR") for rapid 

12 and accurate assembly of thousands of sequence fragments. The collection of contigs 

13 generated by the assembly step is loaded into the database with the lassie program. 

14 Identification of open reading frames (ORFs) is accomplished by processing contigs 

15 with zorf. The ORFs are searched against S. aureus sequences from Genbank and 

16 against all protein sequences using the BLASTN and BLASTP programs, described in 

17 Altschul et al, J. Mol Biol 215: 403-410 (1990)). Results of the ORF determination 

18 and similarity searching steps were loaded into the database. As described below, 

19 some results of the determination and the searches are set out in Tables 1-3.. 

20 

21 DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

22 

23 The present invention is based on the sequencing of fragments of the 

24 Staphylococcus aureus genome and analysis of the sequences. The primary nucleotide 

25 sequences generated by sequencing the fragments are provided in SEQ ID NOS:l* 

26 5,191. (As used herein, the "primary sequence" refers to the nucleotide sequence 

27 represented by the IUPAC nomenclature system.) 

28 In addition to the aforementioned Staphylococcus aureus polynucleotide and 

29 polynucleotide sequences, the present invention provides the nucleotide sequences of 

30 SEQ ID NOS: 1-5,191, or representative fragments thereof, in a form which can be 

31 readily used, analyzed, and interpreted by a skilled artisan. 
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1 As used herein, a "representative fragment of the nucleotide sequence depicted 

2 in SEQ ID NOS:l-5,191 M refers to any portion of the SEQ ID NOS:l-5,I91 which is 

3 not presently represented within a publicly available database. Preferred representative 

4 fragments of the present invention are Staphylococcus aureus open reading frames ( 

5 ORFs"), expression modulating fragment ( EMFs") and fragments which can be used 

6 to diagnose the presence of Staphylococcus aureus in sample ("DFs"). A non-limiting 

7 identification of preferred representative fragments is provided in Tables 1-3. 

8 As discussed in detail below, the information provided in SEQ ID NOS:l- 

9 5,191 and in Tables 1-3 together with routine cloning, synthesis, sequencing and 

10 assay methods will enable those skilled in the art to clone and sequence all 

1 1 "representative fragments" of interest, including open reading frames encoding a large 

12 variety of Staphylococcus aureus proteins. 

13 While the presently disclosed sequences of SEQ ID NOS:l-5,191 are highly 

14 accurate, sequencing techniques are not perfect and, in relatively rare instances, further 

15 investigation of a fragment or sequence of the invention may reveal a nucleotide 

16 sequence error present in a nucleotide sequence disclosed in SEQ ID NOS; 1-5,191. 

17 However, once the present invention is made available (i.e., once the information in 

18 SEQ ID NOS:l-5,191 and Tables 1-3 has been made available), resolving a rare 

19 sequencing error in SEQ ID NOS: 1-5,191 will be well within the skill of the art. The 

20 present disclosure makes available sufficient sequence information to allow any of the 

21 described contigs or portions thereof to be obtained readily by straightforward 

22 application of routine techniques. Further sequencing of such polynucleotide may 

23 proceed in like manner using manual and automated sequencing methods which are 

24 employed ubiquitous in the art. Nucleotide sequence editing software is publicly 

25 available. For example, Applied Biosystem's (AB) AutoAssembler can be used as an 

26 aid during visual inspection of nucleotide sequences. By employing such routine 

27 techniques potential errors readily may be identified and the correct sequence then may 

28 be ascertained by targeting further sequencing effort, also of a routine nature, to the 

29 region containing the potential error, 

30 Even if all of the very rare sequencing errors in SEQ ID NOS: 1-5,191 were 

31 corrected, the resulting nucleotide sequences would still be at least 95% identical, 
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1 nearly all would be at least 99% identical, and the great majority would be at least 

2 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-5,191. 

3 As discussed elsewhere hererin, polynucleotides of the present invention 

4 readily may be obtained by routine application of well known and standard procedures 

5 for cloning and sequencing DNA, Detailed methods for obtaining libraries and for 

6 sequencing are provided below, for instance. A wide variety of Staphylococcus 
1 aureus strains that can be used to prepare S aureus genomic DNA for cloning and for 

8 obtaining polynucleotides of the present invention are available to the public from 

9 recognized depository institutions, such as the American Type Culture Collection 
10 (ATCC"). 



11 The nucleotide sequences of the genomes from different strains of 

12 Staphylococcus aureus differ somewhat. However, the nucleotide sequences of the 

13 genomes of all Staphylococcus aureus strains will be at least 95% identical, in 

14 corresponding part, to the nucleotide sequences provided in SEQ ID NOS: 1-5, 19 L 

15 Nearly all will be at least 99% identical and the great majority will be 99.9% identical, 

16 Thus, the present invention further provides nucleotide sequences which are at 

17 least 95%, preferably 99% and most preferably 99.9% identical to the nucleotide 

18 sequences of SEQ ID NOS: 1-5,191, in a form which can be readily used, analyzed 

19 and interpreted by the skilled artisan. 

20 Methods for determining whether a nucleotide sequence is at least 95%, at least 

21 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID NOS:l-5,191 

22 are routine and readily available to the skilled artisan. For example, the well known 

23 fasta algorithm described in Pearson and Lipman, Proc. Natl Acad, Set USA 25: 

24 2444 (1988) can be used to generate the percent identity of nucleotide sequences. The 
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1 BLASTN program also can be used to generate an identity score of polynucleotides 

2 compared to one another. 
3 

4 COMPUTER RELATED EMBODIMENTS 

5 The nucleotide sequences provided in SEQ ID NOS:l~5,191, a representative 

6 fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and 

7 most preferably at least 99.9% identical to a polynucleotide sequence of SEQ ID 

8 NOS: 1-5,191 may be "provided" in a variety of mediums to facilitate use thereof. As 

9 used herein, 6provided" refers to a manufacture, other than an isolated nucleic acid 

10 molecule, which contains a nucleotide sequence of the present invention; i.e., a 

11 nucleotide sequence provided in SEQ ID NOS:l-5,191, a representative fragment 

12 thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most 

13 preferably at least 99,9% identical to a polynucleotide of SEQ ID NOS:l-5,191. Such 

14 a manufacture provides a large portion of the Staphylococcus aureus genome and parts 

15 thereof (e.g., a Staphylococcus aureus open reading frame (ORF)) in a form which 

16 allows a skilled artisan to examine the manufacture using means not directly applicable 

17 to examining the Staphylococcus aureus genome or a subset thereof as it exists in 

18 nature or in purified form. 

19 In one application of this embodiment, a nucleotide sequence of the present 

20 invention can be recorded on computer readable media. As used herein, "computer 

21 readable media" refers to any medium which can be read and accessed directly by a 

22 computer. Such media include, but are not limited to: magnetic storage media, such as 

23 floppy discs, hard disc storage medium, and magnetic tape; optical storage media such 

24 as CD- ROM; electrical storage media such as RAM and ROM; and hybrids of these 

25 categories, such as magnetic/optical storage media. A skilled artisan can readily 

26 appreciate how any of the presently known computer readable mediums can be used to 

27 create a manufacture comprising computer readable medium having recorded thereon a 

28 nucleotide sequence of the present invention. Likewise, it will be clear to those of skill 

29 how additional computer readable media that may be developed also can be used to 

30 create analogous manufactures having recorded thereon a nucleotide sequence of the 

3 1 present invention. 



15 



2194411 



1 As used herein, "recorded" refers to a process for storing information on 

2 computer readable medium, A skilled artisan can readily adopt any of the presently 

3 know methods for recording information on computer readable medium to generate 

4 manufactures comprising the nucleotide sequence information of the present invention. 

5 A variety of data storage structures are available to a skilled artisan for creating 

6 a computer readable medium having recorded thereon a nucleotide sequence of the 

7 present invention. The choice of the data storage structure will generally be based on 

8 the meaiis chosen to access the stored information. In addition, a variety of data 

9 processor programs and formats can be used to store the nucleotide sequence 

10 information of the present invention on computer readable medium. The sequence 

11 information can be represented in a word processing text file, formatted in 

12 commercially- available software such as WordPerfect and MicroSoft Word, or 

13 represented in the form of an ASCII file, stored in a database application, such as 

14 DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt any number of 

15 data-processor structuring formats (e.g., text file or database) in order to obtain 

16 computer readable medium having recorded thereon the nucleotide sequence 

17 information of the present invention. 

18 Computer software is publicly available which allows a skilled artisan to access 

19 sequence information provided in a computer readable medium. Thus, by providing in 

20 computer readable form the nucleotide sequences of SEQ ID NOS: 1-5,191, a 

21 representative fragment thereof, or a nucleotide sequence at least 95%, preferably at 

22 least 99% and most preferably at least 99.9% identical to a sequence of SEQ ID 

23 NOS: 1-5,191 the present invention enables the skilled artisan routinely to access the 

24 provided sequence information for a wide variety of purposes. 

25 The examples which follow demonstrate how software which implements the 

26 BLAST (Altschul et aL, J. Mol. Biol. 215:403-410 (1990)) and BLAZE (Brutlag et 

27 a/., Comp. Chem. 17:203-207 (1993)) search algorithms on a Sybase system was 

28 used to identify open reading frames (ORFs) within the Staphylococcus aureus 

29 genome which contain homology to ORFs or proteins from both Staphylococcus 

30 aureus and from other organisms. Among the ORFs discussed herein are protein 

31 encoding fragments of the Staphylococcus aureus genome useful in producing 
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1 commercially important proteins, such as enzymes used in fermentation reactions and 

2 in the production of commercially useful metabolites. 

3 The present invention further provides systems, particularly computer-based 

4 systems, which contain the sequence information described herein. Such systems are 

5 designed to identify, among other things, commercially important fragments of the 

6 Staphylococcus aureus genome, 

7 As used herein, "a computer-based system" refers to the hardware means, 

8 software means, and data storage means used to analyze the nucleotide sequence 

9 information of the present invention. The minimum hardware means of the computer- 

10 based systems of the present invention comprises a central processing unit (CPU), 

11 input means, output means, and data storage means, A skilled artisan can readily 

12 appreciate that any one of the currently available computer-based system are suitable 

13 for use in the present invention* 

14 As stated above, the computer-based systems of the present invention comprise 

15 a data storage means having stored therein a nucleotide sequence of the present 

16 invention and the necessary hardware means and software means for supporting and 

17 implementing a search means. 

18 As used herein, "data storage means" refers to memory which can store 

19 nucleotide sequence information of the present invention, or a memory access means 

20 which can access manufactures having recorded thereon the nucleotide sequence 

21 information of the present invention. 

22 As used herein, "search means" refers to one or more programs which are 

23 implemented on the computer- based system to compare a target sequence or target 

24 structural motif with the sequence information stored within the data storage means. 

25 Search means are used to identify fragments or regions of the present genomic 

26 sequences which match a particular target sequence or target motif, A variety of 

27 known algorithms are disclosed publicly and a variety of commercially available 

28 software for conducting search means are and can be used in the computer-based 

29 systems of the present invention. Examples of such software includes, but is not 

30 limited to, MacPattern (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan 

3 1 can readily recognize that any one of the available algorithms or implementing software 



17 

2194411 



1 packages for conducting homology searches can be adapted for use in the present 

2 computer-based systems. 

3 As used herein, a "target sequence" can be any DNA or amino acid sequence of 

4 six or more nucleotides or two or more amino acids. A skilled artisan can readily 

5 recognize that the longer a target sequence is, the less likely a target sequence will be 

6 present as a random occurrence in the database. The most preferred sequence length of 

7 a target sequence is from about 10 to 100 amino acids or from about 30 to 300 

8 nucleotide residues* However, it is well recognized that searches for commercially 

9 important fragments, such as sequence fragments involved in gene expression and 

10 protein processing, may be of shorter length. 

11 As used herein, "a target structural motif," or "target motif," refers to any 
12. rationally selected sequence or combination of sequences in which the sequence(s) are 

13 chosen based on a three-dimensional configuration which is formed upon the folding 

14 of the target motif. There are a variety of target motifs known in the art. Protein target 

15 motifs include, but are not limited to, enzymie active sites and signal sequences. 

16 Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin 

17 structures and inducible expression elements (protein binding sequences). 

18 A variety of structural formats for the input and output means can be used to 

19 input and output the information in the computer-based systems of the present 

20 invention. A preferred format for an output means ranks fragments of the 

21 Staphylococcus aureus genomic sequences possessing varying degrees of homology to 

22 the target sequence or target motif. Such presentation provides a skilled artisan with a 

23 ranking of sequences which contain various amounts of the target sequence or target 

24 motif and identifies the degree of homology contained in the identified fragment. 

25 A variety of comparing means can be used to compare a target sequence or 

26 target motif with the data storage means to identify sequence fragments of the 

27 Staphylococcus aureus genome. In the present examples, implementing software 

28 which implement the BLAST and BLAZE algorithms, described in Altschul et al, J. 

29 MoL Biol 215 : 403-410 (1990), was used to identify open reading frames within the 

30 Staphylococcus aureus genome. A skilled artisan can readily recognize that any one of 

31 the publicly available homology search programs can be used as the search means for 
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1 the computer-based systems of the present invention. Of course, suitable proprietary 

2 systems that may be known to those of skill also may be employed in this regard, 

3 Figure 1 provides a block diagram of a computer system illustrative of 

4 embodiments of this aspect of present invention. The computer system 102 includes a 

5 processor 106 connected to a bus 104. Also connected to the bus 104 are a main 

6 memory 108 (preferably implemented as random access memory, RAM) and a variety 

7 of secondary storage devices 1 10, such as a hard drive 112 and a removable medium 

8 storage device 114. The removable medium storage device 114 may represent, for 

9 example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A 

10 removable storage medium 116 (such as a floppy disk, a compact disk, a magnetic 

1 1 tape, etc.) containing control logic and/or data recorded therein may be inserted into the 

12 removable medium storage device 114. The computer system 102 includes appropriate 

13 software for reading the control logic and/or the data from the removable medium 

14 storage device 1 14, once it is inserted into the removable medium storage device 1 14. 

15 A nucleotide sequence of the present invention may be stored in a well known 

16 manner in the main memory 108, any of the secondary storage devices 1 10, and/or a 

17 removable storage medium 116. During execution, software for accessing and 

18 processing the genomic sequence (such as search tools, comparing tools, etc.) reside 

19 in main memory 108, in accordance with the requirements and operating parameters of 

20 the operating system, the hardware system and the software program or programs. 
21 

22 BIOCHEMICAL EMBODIMENTS 

23 Other embodiments of the present invention are directed to isolated fragments 

24 of the Staphylococcus aureus genome. The fragments of the Staphylococcus aureus 

25 genome of the present invention include, but are not limited to fragments which encode 

26 peptides, hereinafter open reading frames (ORFs), fragments which modulate the 

27 expression of an operably linked ORF, hereinafter expression modulating fragments 

28 (EMFs) and fragments which can be used to diagnose the presence of Staphylococcus 

29 aureus in a sample, hereinafter diagnostic fragments (DFs), 

30 As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of 

31 the Staphylococcus aureus genome" refers to a nucleic acid molecule possessing a 

32 specific nucleotide sequence which has been subjected to purification means to reduce, 
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1 from the composition, the number of compounds which are normally associated with 

2 the composition. Particularly, the term refers to the nucleic acid molecules having the 

3 sequences set out in SEQ ID NOS: 1-5,191, to representative fragments thereof as 

4 described above, to polynucleotides at least 95%, preferably at least 99% and 

5 especially preferably at least 99.9% identical in sequence thereto, also as set out above. 

6 A variety of purification means can be used to generated the isolated fragments 

7 of the present invention. These include, but are not limited to methods which separate 

8 constituents of a solution based on charge, solubility, or size, 

9 In one embodiment, Staphylococcus aureus DNA can be mechanically sheared 

10 to produce fragments of 15-20 kb in length. These fragments can then be used to 

1 1 generate an Staphylococcus aureus library by inserting them into lambda clones as 

12 described in the Examples below. Primers flanking, for example/ an ORF, such as 

13 those enumerated in Tables 1-3 can then be generated using nucleotide sequence 

14 information provided in SEQ ID NOS: 1-5,191 . Well known and routine techniques of 

15 PCR cloning then can be used to isolate the ORF from the lambda DNA library of 

16 Staphylococcus aureus genomic DNA. Thus, given the availability of SEQ ID NOS : 1 - 

17 5,191, the information in Tables 1, 2 and 3, and the information that may be obtained 

18 readily by analysis of the sequences of SEQ ID NOS: 1-5,191 using methods set out 

19 above, those of skill will be enabled by the present disclosure to isolate any ORF- 

20 containing or other nucleic acid fragment of the present invention. 

21 The isolated nucleic acid molecules of the present invention include, but are not 

22 limited to single stranded and double stranded DNA, and single stranded RNA. 

23 As used herein, an "open reading frame," ORF, means a series of triplets 

24 coding for amino acids without any termination codons and is a sequence translatable 

25 into protein. 

26 Tables 1, 2 and 3 list ORFs in the Staphylococcus aureus genomic contigs of 

27 the present invention that were identified as putative coding regions by the GeneMark 

28 software using organism-specific second-order Markov probability transition matrices. 

29 It will be appreciated that other criteria can be used, in accordance with well known 

30 analytical methods, such as those discussed herein, to generate more inclusive, more 

31 restrictive or more selective lists. 
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1 Table 1 sets out ORFs in the Staphylococcus aureus contigs of the present 

2 invention that are at least 80 amino acids long and over a continuous region of at least 

3 50 bases which are 95% or more identical (by BLAST analysis) to an S, aureus 

4 nucleotide sequence available through Genbank in November 1996, 

5 Table 2 sets out ORFs in the Staphylococcus aureus contigs of the present 

6 invention that are not in Table 1 and match, with a BLASTP probability score of 0.0 1 

7 or less, a polypeptide sequence available through Genbank by September 1996. 

8 Table 3 sets out ORFs in the Staphylococcus aureus contigs of the present 

9 invention that do not match significantly, by BLASTP analysis, a polypeptide 

10 sequence available through Genbank by September 1996. 

1 1 In each table, the first and second columns identify the ORF by, respectively, 

12 contig number and ORF number within the contig; the third column indicates the 

13 reading frame, taking the first 5' nucleotide of the contig as the start of the +1 frame; 

14 the fourth column indicates the first nucleotide of the ORF, counting from the 5' end of 

15 the contig strand; and the fifth column indicates the length of each ORF in nucleotides. 

16 In Tables 1 and 2, column six, lists the Reference" for the closest matching 

17 sequence available through Genbank. These reference numbers are the databases entry 

18 numbers commonly used by those of skill in the art, who will be familiar with their 

19 denominators. Descriptions of the numenclature are available from the National Center 

20 for Biotechnology Information. Column seven in Tables 1 and 2 provides the gene 

21 name" of the matching sequence; column eight provides the BLAST identity" score 

22 from the comparison of the ORF and the homologous gene; and column nine indicates 

23 the length in nucleotides of the highest scoring segment pair" identified by the BLAST 

24 identity analysis. 

25 In Table 3, the last column, column six, indicates the length of each ORF in 

26 amino acid residues* 

27 The concepts of percent identity and percent similarity of two polypeptide 

28 sequences is well understood in the art. For example, two polypeptides 10 amino 

29 acids in length which differ at three amino acid positions (e.g., at positions 1, 3 and 5) 

30 are said to have a percent identity of 70%. However, the same two polypeptides 

31 would be deemed to have a percent similarity of 80% if, for example at position 5, the 

32 amino acids moieties, although not identical, were "similar" (i.e., possessed similar 
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1 biochemical characteristics). Many programs for analysis of nucleotide or amino acid 

2 sequence similarity, such as fasta and BLAST specifically list per cent identity of a 

3 matching region as an output parameter. Thus, for instance, Tables 1 and 2 herein 

4 enumerate the per cent identity" of the highest scoring segment pair" in each ORF and 

5 its listed relative* Further details concerning the algorithms and criteria used for 

6 homology searches are provided below and are described in the pertinent literature 

7 highlighted by the citations provided below, 

8 It will be appreciated that other criteria can be used to generate more inclusive 

9 and more exclusive listings of the types set out in the tables. As those of skill will 

10 appreciate, narrow and broad searches both ate useful. Thus, a skilled artisan can 

1 1 readily identify ORFs in contigs of the Staphylococcus aureus genome other than those 

12 listed in Tables 1-3, such as ORFs which are overlapping or encoded by the opposite 

13 strand of an identified ORF in addition to those ascertainable using the computer-based 

14 systems of the present invention, 

15 As used herein, an "expression modulating fragment," EMF, means a series of 

16 nucleotide molecules which modulates the expression of an operably linked ORF or 

17 EMF. 

18 As used herein, a sequence is said to "modulate the expression of an operably 

19 ' linked sequence" when the expression of the sequence is altered by the presence of the 

20 EMR EMFs include, but are not limited to, promoters, and promoter modulating 

21 sequences (inducible elements). One class of EMFs are fragments which induce the 

22 expression or an operably linked ORF in response to a specific regulatory factor or 

23 physiological event. 

24 EMF sequences can be identified within the contigs of the Staphylococcus 

25 aureus genome by their proximity to the ORFs provided in Tables 1-3. An intergenic 

26 segment, or a fragment of the intergenic segment, from about 10 to 200 nucleotides in 

27 length, taken from any one of the ORFs of Tables 1-3 will modulate the expression of 

28 an operably linked ORF in a fashion similar to that found with the naturally linked 

29 ORF sequence. As used herein, an "intergenic segment" refers to fragments of the 

30 Staphylococcus aureus genome which are between two ORF(s) herein described. 

31 EMFs also can be identified using known EMFs as a target sequence or target motif in 
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1 the computer-based systems of the present invention. Further, the two methods can be 

2 combined and used together. 

3 The presence and activity of an EMF can be confirmed using an EMF trap 

4 vector. An EMF trap vector contains a cloning site linked to a marker sequence, A 

5 marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a 

6 complementing nutrition auxotrophic factor, which can be identified or assayed when 

7 the EMF trap vector is placed within an appropriate host under appropriate conditions. 

8 As described above, a EMF will modulate the expression of an operably linked marker 

9 sequence* A more detailed discussion of various marker sequences is provided below. 

10 A sequence which is suspected as being an EMF is cloned in all three reading 

1 1 frames in one or more restriction sites upstream from the marker sequence in the EMF 

12 trap vector. The vector is then transformed into an appropriate host using known 

13 procedures and the phenotype of the transformed host in examined under appropriate 

14 conditions. As described above, an EMF will modulate the expression of an operably 

15 linked marker sequence. 

16 As used herein, a "diagnostic fragment," DF, means a series of nucleotide 

17 molecules which selectively hybridize to Staphylococcus aureus sequences. DFs can 

18 be readily identified by identifying unique sequences within contigs of the 

19 Staphylococcus aureus genome, such as by using well-known computer analysis 

20 software, and by generating and testing probes or amplification primers consisting of 

21 the DF sequence in an appropriate diagnostic format which determines amplification or 

22 hybridization selectivity. 

23 The sequences falling within the scope of the present invention are not limited 

24 to the specific sequences herein described, but also include allelic and species 

25 variations thereof. Allelic and species variations can be routinely determined by 

26 comparing the sequences provided in SEQ ID NOS: 1-5,191, a representative fragment 

27 thereof, or a nucleotide sequence at least 99% and preferably 99.9% identical to SEQ 

28 ED NOS: 1-5,191, with a sequence from another isolate of the same species. 

29 Furthermore, to accommodate codon variability, the invention includes nucleic 

30 acid molecules coding for the same amino acid sequences as do the specific ORFs 

31 disclosed herein. In other words, in the coding region of an ORF, substitution of one 

32 codon for another which encodes the same amino acid is expressly contemplated. 
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1 Any specific sequence disclosed herein can be readily screened for errors by 

2 resequencing a particular fragment, such as an ORF, in both directions (i.e., sequence 

3 both strands). Alternatively, error screening can be performed by sequencing 

4 corresponding polynucleotides of Staphylococcus aureus origin isolated by using part 

5 or all of the fragments in question as a probe or primer, 

6 Each of the ORFs of the Staphylococcus aureus genome disclosed in Tables 1 , 

7 2 and 3, and the EMFs found 5' to the ORFs, can be used as polynucleotide reagents 

8 in numerous ways. For. example, the sequences can be used as diagnostic probes or 

9 diagnostic amplification primers to detect the presence of a specific microbe in a 

10 sample, particular Staphylococcus aureus. Especially preferred in this regard are ORF 

11 such as those of Table 3, which do not match previously characterized sequences from 

12 other organisms and thus are most likely to be highly selective for Staphylococcus 

13 aureus. Also particularly preferred are ORFs that can be used to distinguish between 

14 strains of Staphylococcus aureus, particularly those that distinguish medically 

15 important strain, such as drug-resistant strains. 

16 In addition, the fragments of the present invention, as broadly described, can 

17 be used to control gene expression through triple helix formation or antisense DNA or 

18 RNA, both of which methods are based on the binding of a polynucleotide sequence to 

19 DNA or RNA. Triple helix- formation optimally results in a shut-off of RNA 

20 transcription from DNA, while antisense RNA hybridization blocks translation of an 

21 mRNA molecule into polypeptide, Information from the sequences of the present 

22 invention can be used to design antisense and triple helix-forming oligonucleotides. 

23 Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length 

24 and are designed to be complementary to a region of the gene involved in transcription, 

25 for triple-helix formation, or to the mRNA itself, for antisense inhibition. Both 

26 techniques have been demonstrated to be effective in model systems, and the requisite 

27 techniques are well known and involve routine procedures. Triple helix techniques are 

28 discussed in, for example, Lee et al y NucL Acids Res. 6: 3073 (1979); Cooney et aL, 

29 Science 241: 456 (1988); and Dervan et al. y Science 251: 1360 (1991). Antisense 

30 techniques in general are discussed in, for instance, Okano, J. Neurochern, 56: 560 

31 (1991) and OLIGODEOXYNUCLEOTIDES AS ANTISENSE INHIBITORS OF 

32 GENE EXPRESSION, CRC Press, Boca Raton, FL (1988)). 
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1 The present invention further provides recombinant constructs comprising one 

2 or more fragments of the Staphylococcus aureus genomic fragments and contigs of the 

3 present invention. Certain preferred recombinant constructs of the present invention 

4 comprise a vector, such as a plasmid or viral vector, into which a fragment of the 

5 Staphylococcus aureus genome has been inserted, in a forward or reverse orientation. 

6 In the case of a vector comprising one of the ORFs of the present invention, the vector 

7 may further comprise regulatory sequences, including for example, a promoter, 

8 operably linked to the ORF. For vectors comprising the EMFs of the present 

9 invention, the vector may further comprise a marker sequence or heterologous ORF 
10 operably linked to the EMR 

n Large numbers of suitable vectors and promoters are known to those of skill in 

12 the art and are commercially available for generating the recombinant constructs of the 

13 present invention. The following vectors are provided by way of example. Useful 

14 bacterial vectors include phagescript, PsiX174, pBluescript SK and KS (+ and -), 

15 pNH8a, pNH16a, pNH18a, pNH46a (available from Stratagene); pTrc99A, pKK223- 

16 3, pKK233-3, pDR540, pRTTS (available from Pharmacia). Useful eukaryotic vectors 

17 include pWLneo, pSV2cat, pOG44, pXTl, pSG (available from Stratagene) pSVK3, 

1 8 pBPV, pMSG, pS VL (available from Pharmacia). 

19 Promoter regions can be selected from any desired gene using CAT 

20 (chloramphenicol transferase) vectors or other vectors with selectable markers. Two 

21 appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters 

22 include lad, lacZ, T3, T7, gpt, lambda PR, and trc. Eukaryotic promoters include 

23 CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from 

24 retrovirus, and mouse metallothionein- L Selection of the appropriate vector and 

25 promoter is well within the level of ordinary skill in the art. 

26 The present invention further provides host cells containing any one of the 

27 isolated fragments of the Staphylococcus aureus genomic fragments and contigs of the 

28 present invention, wherein the fragment has been introduced into the host cell using 

29 known methods. The host cell can be a higher eukaryotic host cell, such as a 

30 mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or a procaryotic cell, 

3 1 such as a bacterial cell. 
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1 A polynucleotide of the present invention, such as a recombinant construct 

2 comprising an ORF of the present invention, may be introduced into the host by a 

3 variety of well established techniques that are standard in the art, such as calcium 

4 phosphate transfection, DEAE, dextran mediated transfection and electroporation, 

5 which are described in, for instance, Davis, L. et al, BASIC METHODS IN 

6 MOLECULAR BIOLOGY (1986). 

7 A host cell containing one of the fragments of the Staphylococcus aureus 

8 genomic fragments and contigs of the present invention, can be used in conventional 

9 manners to produce the gene product encoded by the isolated fragment (in the case of 

10 an ORF) or can be used to produce a heterologous protein under the control of the 

11 EMF. 

12 The present invention further provides isolated polypeptides encoded by the 
. 13 nucleic acid fragments of the present invention or by degenerate variants of the nucleic 

14 acid fragments of the present invention. By "degenerate variant" is intended nucleotide 

15 fragments which differ from a nucleic acid fragment of the present invention {e.g. , an 

16 ORF) by nucleotide sequence but, due to the degeneracy of the Genetic Code, encode 

17 an identical polypeptide sequence, 

18 Preferred nucleic acid fragments of the present invention are the ORFs depicted 

1 9 in Tables 2 and 3 which encode proteins. 

20 A variety of methodologies known in the art can be utilized to obtain any one of 

21 the isolated polypeptides or proteins of the present invention. At the simplest level, the 

22 amino acid sequence can be synthesized using commercially available peptide 

23 synthesizers. This is particularly useful in producing small peptides and fragments of 

24 larger polypeptides. Such short fragments as may be obtained most readily by 

25 synthesis are useful, for example, in generating antibodies against the native 

26 polypeptide, as discussed further below. 

27 In an alternative method, the polypeptide or protein is purified from bacterial 

28 cells which naturally produce the polypeptide or protein. One skilled in the art can 

29 readily employ well-known methods for isolating polpeptides and proteins to isolate 

30 and purify polypeptides or proteins of the present invention produced naturally by a 

31 bacterial strain, or by other methods. Methods for isolation and purification that can be 

32 employed in this regard include, but are not limited to, immunochromatography, 
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1 HPLC, size-exclusion chromatography, ion-exchange chromatography, and immuno- 

2 affinity chromatography. 

3 The polypeptides and proteins of the present invention also can be purified 

4 from cells which have been altered to express the desired polypeptide or protein. As 

5 used herein, a cell is said to be altered to express a desired polypeptide or protein when 

6 the cell, through genetic manipulation, is made to produce a polypeptide or protein 

7 which it normally does not produce or which the cell normally produces at a lower 

8 level. Those skilled in the art can readily adapt procedures for introducing and 

9 expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic 

10 cells in order to generate a cell which produces one of the polypeptides or proteins of 

1 1 the present invention. 

12 Any host/vector system can be used to express one or more of the ORFs of the 

13 present invention. These include, but are not limited to, eukaryotic hosts such as HeLa 

14 cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. colt 

15 and 2?. subtilis. The most preferred cells are those which do not normally express the 

16 particular polypeptide or protein or which expresses the polypeptide or protein at low 

17 natural level. 

18 "Recombinant," as used herein, means that a polypeptide or protein is derived 

19 from recombinant (e.g., microbial or mammalian) expression systems. "Microbial" 

20 refers to recombinant polypeptides or proteins made in bacterial or fungal (e.g., yeast) 

21 expression systems. As a product, "recombinant microbiaTdefines a polypeptide or 

22 protein essentially free of native endogenous substances and unaccompanied by 

23 associated native glycosylation. Polypeptides or proteins expressed in most bacterial 

24 cultures, e.g., E. coli, will be free of glycosylation modifications; polypeptides or 

25 proteins expressed in yeast will have a glycosylation pattern different from that 

26 expressed in mammalian cells. 

27 "Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides. 

28 Generally, DNA segments encoding the polypeptides and proteins provided by this 

29 invention are assembled from fragments of the Staphylococcus aureus genome and 

30 short oligonucleotide linkers, or from a series of oligonucleotides, to provide a 

31 synthetic gene which is capable of being expressed in a recombinant transcriptional 

32 unit comprising regulatory elements derived from a microbial or viral operon. 
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1 6Recombinaat expression vehicle or vector" refers to a plasmid or phage or 

2 virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The 

3 expression vehicle can comprise a transcriptional unit comprising an assembly of (1) a 

4 genetic regulatory elements necessary for gene expression in the host, including 

5 elements required to initiate and maintain transcription at a level sufficient for suitable 

6 expression of the desired polypeptide, including, for example, promoters and, where 

7 necessary, an enhancers and a polyadenylation signal; (2) a structural or coding 

8 sequence which is transcribed into mRNA and translated into protein, and (3) 

9 appropriate signals to initiate translation at the beginning of the desired coding region 

10 and terminate translation at its end. Structural units intended for use in yeast or 

11 eukaryotic expression systems preferably include a leader sequence enabling 

12 extracellular secretion of translated protein by a host cell Alternatively, where 

13 recombinant protein is expressed without a leader or transport sequence, it may include 

14 an N-terminal methionine residue. This residue may or may not be subsequently 

15 cleaved from the expressed recombinant protein to provide a final product. 

16 "Recombinant expression system" means host cells which have stably 

17 integrated a recombinant transcriptional unit into chromosomal DNA or carry the 

18 recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or 

19 eukaryotic. Recombinant expression systems as defined herein wilt express 

20 heterologous polypeptides or proteins upon induction of the regulatory elements linked 

21 to the DNA segment or synthetic gene to be expressed. 

22 Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other 

23 cells under the control of appropriate promoters. Cell-free translation systems can also 

24 be employed to produce such proteins using RNAs derived from the DNA constructs 

25 of the present invention, Appropriate cloning and expression vectors for use with 

26 prokaryotic and eukaryotic hosts are described in Sambrook et al. y MOLECULAR 

27 CLONING: A LABORATORY MANUAL, 2 nd Edition, Cold Spring Harbor 

28 Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which is 

29 hereby incorporated by reference in its entirety. 

30 Generally, recombinant expression vectors will include origins of replication 

31 and selectable markers permitting transformation of the host cell, e.g., the ampicillin 

32 resistance gene of £. coli and S. cerevisiae TRP1 gene, and a promoter derived from a 
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1 highly expressed gene to direct transcription of a downstream structural sequence. 

2 Such promoters can be derived from operons encoding glycolytic enzymes such as 3- 

3 phosphoglycerate kinase (PGK), alpha-factor, acid phosphatase, or heat shock 

4 proteins, among others. The heterologous structural sequence is assembled in 

5 appropriate phase with translation initiation and termination sequences, and preferably, 

6 a leader sequence capable of directing secretion of translated protein into the 

7 periplasmic space or extracellular medium. Optionally, the heterologous sequence can 

8 encode a fusion protein including an N-terminal identification peptide imparting desired 

9 characteristics, e.g., stabilization or simplified purification of expressed recombinant 

10 product. 

11 Useful expression vectors for bacterial use are constructed by inserting a 

12 structural DNA sequence encoding a desired protein together with suitable translation 

13 initiation and termination signals in operable reading phase with a functional promoter. 

14 The vector will comprise one or more phenotypic selectable markers and an origin of 

15 replication to ensure maintenance of the vector and, when desirable, provide 

16 amplification within the host. 

17 Suitable prokaryotic hosts for transformation include strains of Staphylococcus 

18 aureus, E, coli, B. subtilis, Salmonella typhimurium and various species within the 

19 genera Pseudomonas, Streptomyces, and Staphylococcus, Others may, also be 

20 employed as a matter of choice. 

21 As a representative but non-limiting example, useful expression vectors for 

22 bacterial use can comprise a selectable marker and bacterial origin of replication derived 

23 from commercially available plasmids comprising genetic elements of the well known 

24 cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for 

25 example, pKK223~3 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) 

26 and GEM 1 (available from Promega Biotec, Madison, WI, USA). These pBR322 

27 "backbone" sections are combined with an appropriate promoter and the structural 

28 sequence to be expressed. 

29 Following transformation of a suitable host strain and growth of the host strain 

30 to an appropriate cell density, the selected promoter, where it is inducible, is 

31 derepressed or induced by appropriate means {e.g., temperature shift or chemical 

32 induction) and cells are cultured for an additional period to provide for expression of 
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1 the induced gene product. Thereafter cells are typically harvested, generally by 

2 centrifugation, disrupted to release expressed protein, generally by physical or 

3 chemical means, and the resulting crude extract is retained for further purification. 

4 Various mammalian cell culture systems can also be employed to express 

5 recombinant protein. Examples of mammalian expression systems include the COS-7 

6 lines of monkey kidney fibroblasts, described in Gluzman, Cell 23: 175 (1981), and 

7 . _ ■ other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, 

8 CHO, HeLa and BHK cell lines. 

9 Mammalian expression vectors will comprise an origin of replication, a suitable 

10 promoter and enhancer, and also any necessary ribosome binding sites, 

11 polyadenylation site, splice donor and acceptor sites, transcriptional termination 

12 sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from 

13 the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, 

14 and polyadenylation sites may be used to provide the required nontranscribed genetic 

15 elements. 

16 Recombinant polypeptides and proteins produced in bacterial culture is usually 

17 isolated by initial extraction from cell pellets, followed by one or more salting-out, 
IS aqueous ion exchange or size exclusion chromatography steps. Microbial cells 

19 employed in expression of proteins can be disrupted by any convenient method, 

20 including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 

21 agents. Protein refolding steps can be used, as necessary, in completing configuration 

22 of the mature protein. Finally, high performance liquid chromatography (HPLC) can 

23 be employed for final purification steps. 

24 An additional aspect of the invention includes Staphylococcus aureus 

25 polypeptides which are useful as immunodiagnostic antigens and/or immunoprotective 

26 vaccines, collectively "immunologically useful polypeptides". Such immunologically 

27 useful polypeptides may be selected from the ORFs disclosed herein based on 

28 techniques well known in the art and described elsewhere herein. The inventors have 

29 used the following criteria to select several immunologically useful polypeptides: 

30 As is known in the art, an amino terminal type I signal sequence directs a 

31 nascent protein across the plasma and outer membranes to the exterior of the bacterial 

32 cell. Such outermembrane polypeptides are expected to be immunologically useful. 
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1 According to Izard, J. W. et aL, Mol. Microbiol. 13, 765-773; (1994), polypeptides 

2 containing type I signal sequences contain the following physical attributes: The length 

3 of the type I signal sequence is approximately 15 to 25 primarily hydrophobic amino 

4 acid residues with a net positive charge in the extreme amino terminus; the central 

5 region of the signal sequence must adopt an alpha-helical conformation in a 

6 hydrophobic environment; and the region surrounding the actual site of cleavage is 

7 ideally six residues long, with small side-chain amino acids in the -1 and -3 positions. 

8 Also known in the art is the type IV signal sequence which is an example of the 

9 several types of functional signal sequences which exist in addition to the type I signal 

10 sequence detailed above* Although functionally related, the type IV signal sequence 

1 1 possesses a unique set of biochemical and physical attributes (Strom, M. S. and Lory, 

12 S., J. Bacteriol. 174, 7345-7351; 1992)). These are typically six to eight amino acids 

13 with a net basic charge followed by an additional sixteen to thirty primarily 

14 hydrophobic residues. The cleavage site of a type IV signal sequence is typically after 

15 the initial six to eight amino acids at the extreme amino terminus. In addition, all type 

16 IV signal sequences contain a phenylalanine residue at the +1 site relative to the 

17 cleavage site. 

18 Studies of the cleavage sites of twenty-six bacterial lipoprotein precursors has 

19 allowed the definition of a consensus amino acid sequence for lipoprotein cleavage. 

20 Nearly three-fourths of the bacterial lipoprotein precursors examined contained the 

21 sequence L-(A,S)-(G,A)-C at positions -3 to +1, relative to the point of cleavage 

22 (Hayashi, S, and Wu, H. C Lipoproteins in bacteria. J Bioenerg. Biomembr. 22, 

23 451-471; 1990). 

24 It well known that most anchored proteins found on the surface of gram- 

25 positive bacteria possess a highly conserved carboxy terminal sequence. More than 

26 fifty such proteins from organisms such as S, pyogenes, S, mutans, E. faecalis, S. 

27 pneumoniae, and others, have been identified based on their extracellular location and 

28 carboxy terminal amino acid sequence (Fischetti, V. A. Gram-positive commensal 

29 bacteria deliver antigens to elicit mucosal and systemic immunity. ASM News 62, 

30 405-410; 1996). The conserved region is comprised of six charged amino acids at the 

31 extreme carboxy terminus coupled to 15-20 hydrophobic amino acids presumed to 
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1 function as a transmembrane domain. Immediately adjacent to the transmembrane 

2 domain is a six amino acid sequence conserved in nearly all proteins examined. The 

3 amino acid sequence of this region is I>P-X-T-G-X, where X is any amino acid. 

4 Amino acid sequence similarities to proteins of known function by BLAST 

5 enables the assignment of putative functions to novel amino acid sequences and allows 

6 for the selection of proteins thought to function outside the cell wall. Such proteins are 

7 well known in the art and include "lipoprotein", "periplasmic", or "antigen". 

8 An algorithm for selecting antigenic and immunogenic Staphylococcus aureus 

9 polypeptides including the foregoing criteria was developed by the present inventors. 

10 Use of the algorithm by the inventors to select immunologically useful Staphylococcus 

1 1 aureus polypeptides resulted in the selection of several ORFs which are predicted to be 

12 outermembrane-associated proteins. These proteins are identified in Table 4, below, 

13 and shown in the Sequence Listing as SEQ ID NOS:5,192 to 5,255. Thus the amino 

14 acid sequence of each of several mtigmicStaphylococcus aureus polypeptides listed in 

15 Table 4 can be determined, for example, by locating the amino acid sequence of the 

16 ORF in the Sequence Listing. Likewise the polynucleotide sequence encoding each 

17 ORF can be found by locating the corresponding polynucleotide SEQ ID in Tables 1 , 

18 2, or 3, and finding the corresponding nucleotide sequence in the sequence listing. 

19 As will be appreciated by those of ordinary skill in the art, although a 

20 polypeptide representing an entire ORF may be the closest approximation to a protein 

21 found in vivo, it is not always technically practical to express a complete ORF in vitro, 

22 It may be very challenging to express and purify a highly hydrophobic protein by 

23 common laboratory methods. As a result, the immunologically useful polypeptides 

24 described herein as SEQ ID NOS:5,192-5,255 may have been modified slightly to 

25 simplify the production of recombinant protein, and are the preferred embodiments. In 

26 general, nucleotide sequences which encode highly hydrophobic domains, such as 

27 those found at the amino terminal signal sequence, are excluded for enhanced in vitro 

28 expression of the polypeptides. Furthermore, any highly hydrophobic amino acid 

29 sequences occurring at the carboxy terminus are also excluded. Such truncated 

30 polypeptides include for example the mature forms of the polypeptides expected to 

31 exist in nature. . . 



32 

2194411 



1 Those of ordinary skill in the art can identify soluble portions the polypeptide 

2 identified in Table 4, and in the case of truncated polypeptides sequences shown as 

3 SEQ ID NOS:5, 192-5,255, may obtain the complete predicted amino acid sequence of 

4 each polypeptide by translating the corresponding polynucleotides sequences of the 

5 corresponding ORF listed in Tables 1 ,2 and 3 and found in the sequence listing. 

6 Accordingly, immunologically useful polypeptides comprising the complete 

7 amino acid sequence of a polypeptide selected from the group of polypeptides encoded 

8 by the ORFs identified in Table 4, and immunologically useful polypeptides 

9 comprising an amino acid sequence selected from the group of amino acid sequences 

10 shown in the sequence listing as SEQ ED NOS:5, 191-5,255 form an embodiment of 

1 1 the invention. In addition, polynucleotides encoding the foregoing polypeptides also 

12 form part of the present invention. 

13 In another aspect, the invention provides a peptide or polypeptide comprising 

14 an epitope-bearing portion of a polypeptide of the invention, particularly those epitope- 

15 bearing portions (antigenic regions) identified in Table 4. The epitope-bearing portion 

16 is an immunogenic or antigenic epitope of a polypeptide of the invention. An 

17 "immunogenic epitope" is defined as a part of a protein that elicits an antibody 

18 response when the whole protein is the immunogen. On the other hand, a region of a 

19 protein molecule to which an antibody can bind is defined as an "antigenic epitope." 

20 The number of immunogenic epitopes of a protein generally is less than the number of 

21 antigenic epitopes. See, for instance, Geysen et al., Proc. Natl. Acad. Sci. USA 

22 81:3998-4002(1983). 

23 As to the selection of peptides or polypeptides bearing an antigenic epitope 

24 (i.e., that contain a region of a protein molecule to which an antibody can bind), it is 

25 well known in that art that relatively short synthetic peptides that mimic part of a 

26 protein sequence are routinely capable of eliciting an antiserum that reacts with the 

27 partially mimicked protein. See, for instance, Sutcliffe, J. G., Shinnick, T. M.» 

28 Green, N, and Learner, R. A. (1983) "Antibodies that react with predetermined sites 

29 on proteins", Science, 219:660-666. Peptides capable of eliciting protein-reactive sera 

30 are frequently represented in the primary sequence of a protein, can be characterized by 

31 a set of simple chemical rules, and are confined neither to immunodominant regions of 

32 intact proteins (i.e., immunogenic epitopes) nor to the amino or carboxyl terminals. 
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1 Antigenic epitope-bearing peptides and polypeptides of the invention are therefore 

2 useful to raise antibodies, including monoclonal antibodies, that bind specifically to a 

3 polypeptide of the invention. See, for instance, Wilson et al., Cell 37:767-778 (1984) 

4 at 777. 

5 Antigenic epitope-bearing peptides and polypeptides of the invention preferably 

6 contain a sequence of at least seven, more preferably at least nine and most preferably 

7 between about 15 to about 30 amino acids contained within the amino acid sequence of 

8 a polypeptide of the invention. Non-limiting examples of antigenic polypeptides or 

9 peptides that can be used to generate S. aureus specific antibodies include: a 

10 polypeptide comprising peptides shown in Table 4 below. These polypeptide 

11 fragments have been determined to bear antigenic epitopes of indicated S. aureus 

12 proteins by the analysis of the Jameson- Wolf antigenic index, a representative sample 

13 of which is shown in Figure 3. 

14 The epitope-bearing peptides and polypeptides of the invention may be 

15 produced by any conventional means. See, e.g., Houghten, R. A, (1985) General 

16 method for the rapid solid-phase synthesis of large numbers of peptides: specificity of 

17 antigen-antibody interaction at the level of individual amino acids. Proc. Natl. Acad. 

18 Sci. USA 82:5131-5135; this "Simultaneous Multiple Peptide Synthesis (SMPS)" 

19 process is further described in U.S. Patent No. 4,631,21 1 to Houghten et al. (1986). 

20 Epitope-bearing peptides and polypeptides of the invention are used to induce 

21 antibodies according to methods well known in the art. See, for instance, Sutcliffe et 

22 al M supra; Wilson et al., supra; Chow, M. et al., Proc. Natl. Acad, Sci. USA 

23 82:910-914; and Bittle, F. J. et al., J. Gen. Virol. 66:2347-2354 (1985). 

24 Immunogenic epitope-bearing peptides of the invention, i.e., those parts of a 

25 protein that elicit an antibody response when the whole protein is the immunogen, are 

26 identified according to methods known in the art. See, for instance, Geysen et aL, 

27 supra. Further still, U.S. Patent No. 5,194,392 to Geysen (1990) describes a general 

28 method of detecting or determining the sequence of monomers (amino acids or other 

29 compounds) which is a topological equivalent of the epitope (i.e., a "mimotope") 

30 which is complementary to a particular paratope (antigen binding site) of an antibody 

31 of interest. More generally, U.S. Patent No. 4,433,092 to Geysen (1989) describes a 

32 method of detecting or determining a sequence of monomers which is a topographical 
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1 equivalent of a ligand which is complementary to the ligand binding site of a particular 

2 receptor of interest. Similarly, U.S. Patent No. 5,480,971 to Houghten, R. A. et al. 

3 (1996) on Peralkylated Oligopeptide Mixtures discloses linear Cl~C7~alkyl 

4 peralkylated oligopeptides and sets and libraries of such peptides, as well as methods 

5 for using such oligopeptide sets and libraries for determining the sequence of a 

6 peralkylated oligopeptide that preferentially binds to an acceptor molecule of interest. 

7 Thus, non-peptide analogs of the epitope-bearing peptides of the invention also can be 

8 made routinely by these methods. 

9 Table 4 lists immunologically useful polypeptides identified by an algorithm 

10 which locates novel Staphylococcus aureus outermembrane proteins, as is described 

1 1 above. Also listed are epitopes or "antigenic regions" of each of the identified 

12 polypeptides. The antigenic regions, or epitopes, are delineated by two numbers x-y, 

13 where x is the number of the first amino acid in the open reading frame included within 
H the epitope and y is the number of the last amino acid in the open reading frame 

15 included within the epitope. For example, the first epitope in ORF 168-6 is comprised 

16 of amino acids 36 to 45 of SEQ ID NO:5 } 192, as is described in Table 4. The 

17 inventors have identified several epitopes for each of the antigenic polypeptides 

18 identified in Table 4. Accordingly, forming part of the present invention are 

19 polypeptides comprising an amino acid sequence of one or more antigenic regions 

20 identified in Table 4. The invention further provides polynucleotides encoding such 

21 polypeptides. 

22 The present invention further includes isolated polypeptides, proteins and 

23 nucleic acid molecules which are substantially equivalent to those herein described. As 

24 used herein, substantially equivalent can refer both to nucleic acid and amino acid 

25 sequences, for example a mutant sequence, that varies from a reference sequence by 

26 one or more substitutions, deletions, or additions, the net effect of which does not 

27 result in an adverse functional dissimilarity between reference and subject sequences. 

28 For purposes of the present invention, sequences having equivalent biological activity, 

29 and equivalent expression characteristics are considered substantially equivalent. For 

30 purposes of determining equivalence, truncation of the mature sequence should be 

31 disregarded. 
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1 The invention further provides methods of obtaining homologs from other 

2 strains of Staphylococcus aureus, of the fragments of the Staphylococcus aureus 

3 genome of the present invention and homologs of the proteins encoded by the ORFs of 

4 the present invention* As used herein, a sequence or protein of Staphylococcus aureus 

5 is defined as a homolog of a fragment of the Staphylococcus aureus fragments or 

6 contigs or a protein encoded by one of the ORFs of the present invention, if it shares 

7 significant homology to one of the fragments of the Staphylococcus aureus genome of 

8 the present invention or a protein encoded by one of the ORFs of the present invention, 

9 Specifically, by using the sequence disclosed herein as a probe or as primers, and 

10 techniques such as PGR cloning and colony/plaque hybridization, one skilled in the art 

11 can obtain homologs. 

12 As used herein, two nucleic acid molecules or proteins are said to "share 

13 significant homology'* if the two contain regions which prossess greater than 85% 

14 sequence (amino acid or nucleic acid) homology. Preferred homologs in this regard 

15 are those with more than 90% homology. Especially preferred are those with 93% or 

16 more homology. Among especially preferred homologs those with 95% or more 

17 homology are particularly preferred. Very particularly preferred among these are those 

18 with 97% and even more particularly preferred among those are homologs with 99% 

19 or more homology. The most preferred homologs among these are those with 99.9% 

20 homology or more. It will be understood that, among measures of homology, identity 

21 is particularly preferred in this regard. 

22 Region specific primers or probes derived from the nucleotide sequence 

23 provided in SEQ ID NOS:l-5,191 or from a nucleotide sequence at least 95%, 

24 particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ ID 

25 NOS: 1-5,191 can be used to prime DNA synthesis and PCR amplification, as well as 

26 to identify colonies containing cloned DNA encoding a homolog. Methods suitable to 

27 this aspect of the present invention are well known and have been described in great 

28 detail in many publications such as, for example, Innis et ai> PCR PROTOCOLS, 

29 Academic Press, San Diego, CA (1990)), 

30 When using primers derived from SEQ ID NOS: 1-5,191 or from a nucleotide 

31 sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-5,191, 

32 one skilled in the art will recognize that by employing high stringency conditions (e.g. , 
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1 annealing at 50-60°C in 6X SSPC and 50% formamide, and washing at 50- 65 °C in 

2 0.5X SSPC) only sequences which are greater than 75% homologous to the primer 

3 will be amplified. By employing lower stringency conditions (e.g.* hybridizing at 35- 

4 37°C in 5X SSPC and 40-45% formamide, and washing at 42°C in 0.5X SSPC), 

5 sequences which are greater than 40-50% homologous to the primer will also be 

6 amplified. 

7 When using DNA probes derived from SEQ ED NOS:l-5,19t, or from a 

8 nucleotide sequence having an aforementioned identity to a sequence of SEQ ID 

9 NOS:l~5,191, for colony/plaque hybridization, one skilled in the art will recognize that 

10 by employing high stringency conditions (e.g., hybridizing at 50- 65°C in 5X SSPC 

11 and 50% formamide, and washing at 50- 65°C in 0.5X SSPC), sequences having 
32 regions which are greater than 90% homologous to the probe can be obtained, and that 

13 by employing lower stringency conditions (e.g., hybridizing at 35-37°C in 5X SSPC 

14 and 40-45% formamide, and washing at 42°C in 0.5X SSPC), sequences having 

15 regions which are greater than 35-45% homologous to the probe will be obtained. 

16 Any organism can be used as the source for homologs of the present invention 

17 so long as the organism naturally expresses such a protein or contains genes encoding 

18 the same. The most preferred organism for isolating homologs are bacterias which are 

19 closely related to Staphylococcus aureus. 
20 

21 
22 
23 

24 ILLUSTRATIVE USES OF COMPOSITIONS OF THE 

25 INVENTION 

26 Each ORF provided in Tables 1 and 2 is identified with a function by 

27 homology to a known gene or polypeptide* As a result, one skilled in the art can use 

28 the polypeptides of the present invention for commercial, therapeutic and industrial 

29 purposes consistent with the type of putative identification of the polypeptide. Such 

30 identifications permit one skilled in the art to use the Staphylococcus aureus ORFs in a 

31 manner similar to the known type of sequences for which the identification is made; for 

32 example, to ferment a particular sugar source or to produce a particular metabolite, A 
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1 variety of reviews illustrative of this aspect of the invention are available, including the 

2 following reviews on the industrial use of enzymes, for example, BIOCHEMICAL 

3 ENGINEERING AND BIOTECHNOLOGY HANDBOOK, 2nd Ed*, Macmillan 

4 Publications, Ltd. NY (1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, 

5 Tramper et al. 7 Eds., Elsevier Science Publishers, Amsterdam, The Netherlands 

6 (1985). A variety of exemplary uses that illustrate this and similar aspects of the 

7 present invention are discussed below. 
8 

9 1. Biosynthetic Enzymes 

10 Open reading frames encoding proteins involved in mediating the catalytic 

1 1 reactions involved in intermediary and macromolecular metabolism, the biosynthesis of 

12 small molecules, cellular processes and other functions includes enzymes involved in 

13 the degradation of the intermediary products of metabolism, enzymes involved in 

14 central intermediary metabolism, enzymes involved in respiration, both aerobic and 

15 anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor 

16 force conversion, enzymes involved in broad regulatory function, enzymes involved in 

17 amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in 

18 cofactor and vitamin synthesis, can be used for industrial biosynthesis. 

19 The various metabolic pathways present in Staphylococcus aureus can be 

20 identified based on absolute nutritional requirements as well as by examining the 

21 various enzymes identified in Table 1-3 and SEQ ID NOS:l-5,191. 

22 Of particular interest are polypeptides involved in the degradation of 

23 intermediary metabolites as well as non-macromolecular metabolism. Such enzymes 

24 include amylases, glucose oxidases, and catalase. 

25 Proteolytic enzymes are another class of commercially important enzymes. 

26 Proteolytic enzymes find use in a number of industrial processes including the 

27 processing of flax and other vegetable fibers, in the extraction, clarification and 

28 depectinization of fruit juices, in the extraction of vegetables* oil and in the maceration 

29 of fruits and vegetables to give unicellular fruits. A detailed review of the proteolytic 

30 enzymes used in the food industry is provided in Rombouts et aL, Symbiosis 21: 79 

31 (1986) and Voragen et al in BIOCATALYSTS IN AGRICULTURAL 
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1 BIOTECHNOLOGY, Whitaker etal, Eds., American Chemical Society Symposium, 

2 Series 382: 93 (1989). 

3 The metabolism of sugars is an important aspect of the primary metabolism of 

4 Staphylococcus aureus. Enzymes involved in the degradation of sugars, such as, 

5 particularly, glucose, galactose, fructose and xylose, can be used in industrial 

6 fermentation. Some of the important sugar transforming enzymes, from a commercial 

7 viewpoint, include sugar isomerases such as glucose isomerase. Other metabolic 

8 enzymes have found commercial use such as glucose oxidases which produces 

9 ketogulonic acid (KG A). KGA is an intermediate in the commercial production of 

10 ascorbic acid using the Reichstein's procedure, as described in Krueger et al, 

11 Biotechnology 6<A) . Rhine et al, Eds., Verlag Press, Weinheim, Germany (1984), 

12 Glucose oxidase (GOD) is commercially available and has been used in 

13 purified form as well as in an immobilized form for the deoxygenation of beer. See, 

14 for instance, Hartmeir et a/., Biotechnology Letters i: 21 (1979). The most important 

15 application of GOD is the industrial scale fermentation of gluconic acid. Market for 

16 gluconic acids which are used in the detergent, textile, leather, photographic, 

17 pharmaceutical, food, feed and concrete industry, as described, for example, in Bigelis 

18 et al, beginning on page 357 in GENE MANIPULATIONS AND FUNGI; Benett et 

19 al, Eds., Academic Press, New York (1985). In addition to industrial applications, 

20 GOD has found applications in medicine for quantitative determination of glucose in 

21 body fluids recently in biotechnology for analyzing syrups from starch and cellulose 

22 hydrosylates. This application is described in Owusu et al, Biochem. et Biophysica. 

23 Acta. 872: 83 (1986), for instance. 

24 The main sweetener used in the world today is sugar which comes from sugar 

25 beets and sugar cane. In the field of industrial enzymes, the glucose isomerase process 

26 shows the largest expansion in the market today. Initially, soluble enzymes were used 

27 and later immobilized enzymes were developed (Krueger et al> Biotechnology, The 

28 Textbook of Industrial Microbiology, Sinauer Associated Incorporated, Sunderland, 

29 Massachusetts (1990)). Today, the use of glucose- produced high fructose syrups is 

30 by far the largest industrial business using immobilized enzymes. A review of the 

31 industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988). 
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1 Proteinases, such as alkaline serine proteinases, are used as detergent additives 

2 and thus represent one of the largest volumes of microbial enzymes used in the 

3 industrial sector. Because of their industrial importance, there is a large body of 

4 published and unpublished information regarding the use of these enzymes in 

5 industrial processes. (See Faultman et al, Acid Proteases Structure Function and 

6 Biology, Tang, J., ed., Plenum Press, New York (1977) and Godfrey et cel., 

7 Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner et aL 7 

8 Report Industrial Enzymes by 1990, Hel Hepner & Associates, London (1986)). 

9 Another class of commercially usable proteins of the present invention are the 
10 microbial lipases, described by, for instance, Macrae et al. y Philosophical Transactions 
n of the Chiral Society of London 310:227 (1985) and Poserke, Journal of the American 

12 Oil Chemist Society 61:1758 (1984). A major use of lipases is in the fat and oil 

13 industry for the production of neutral glycerides using lipase catalyzed inter- 

14 esterification of readily available triglycerides. Application of lipases include the use as 

15 a detergent additive to facilitate the removal of fats from fabrics in the course of the 

16 washing procedures* 

17 The use of enzymes, and in particular microbial enzymes, as catalyst for key 

18 steps in the synthesis of complex organic molecules is gaining popularity at a great 

19 rate. One area of great interest is the preparation of crural intermediates. Preparation 

20 of chiral intermediates is of interest to a wide range of synthetic chemists particularly 

21 those scientists involved with the preparation of new pharmaceuticals, agrochemicals, 

22 fragrances and flavors. (See Davies et ah, Recent Advances in the Generation of 

23 Chiral Intermediates Using Enzymes, CRC Press, Boca Raton, Florida (1990)). The 

24 following reactions catalyzed by enzymes are of interest to organic chemists:hydrolysis 

25 of carboxylic acid esters, phosphate esters, amides and nitriles, esterification reactions, 

26 trans-esterification reactions, synthesis of amides, reduction of alkanones and 

27 oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to 

28 sulfoxides, and carbon bond forming reactions such as the aldol reaction, 

29 When considering the use of an enzyme encoded by one of the ORFs of the 

30 present invention for biotransformation and organic synthesis it is sometimes 

31 necessary to consider the respective advantages and disadvantages of using a 

32 microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell 
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1 system on the one hand or an isolated partially purified enzyme on the other hand, has 

2 been described in detail by Bud et al, Chemistry in Britain (1987), p. 127. 

3 Amino transferases, enzymes involved in the biosynthesis and metabolism of 

4 amino acids, are useful in the catalytic production of amino acids. The advantages of 

5 using microbial based enzyme systems is that the amino transferase enzymes catalyze 

6 the stereo- selective synthesis of only L-amino acids and generally possess uniformly 

7 high catalytic rates. A description of the use of amino transferases for amino acid 

8 production is provided by Roselle-David, Methods of Enzymology 136:479 ( 1 987) . 

9 Another category of useful proteins encoded by the ORFs of the present 

10 invention include enzymes involved in nucleic acid synthesis, repair, and 
l l recombination. A variety of commercially important enzymes have previously been 
12 isolated from members of Staphylococcus aureus. These include Sau3A and Sau96L 
13 

14 2* Generation of Antibodies 

15 As described here, the proteins of the present invention, as well as homologs 

16 thereof, can be used in a variety procedures and methods known in the art which axe 

17 currently applied to other proteins. The proteins of the present invention can further be 

18 used to generate an antibody which selectively binds the protein. Such antibodies can 

19 be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, 

20 and humanized forms. 

21 The invention further provides antibodies which selectively bind to one of the 

22 proteins of the present invention and hybridomas which produce these antibodies. A 

23 hybridoma is an immortalized cell line which is capable of secreting a specific 

24 monoclonal antibody. 

25 In general, techniques for preparing polyclonal and monoclonal antibodies as 

26 well as hybridomas capable of producing the desired antibody are well known in the 

27 art (Campbell, A. M., MONOCLONAL ANTIBODY TECHNOLOGY: 

28 LABORATORY TECHNIQUES IN BIOCHEMISTRY AND MOLECULAR 

29 BIOLOGY, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. 

30 Groth et al, J. Immunol Methods 35: 1-21 (1980), Kohler and Milstein, Nature 256: 

31 495-497 (1975)), the trioma technique, the human B- cell hybridoma technique 

32 (Kozbor et al, Immunology Today A;. 72 (1983), pgs. 77-96 of Cole et al, in 
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1 MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R. Liss, Inc. 

2 (1985)), 

3 Any animal (mouse, rabbit, etc. ) which is known to produce antibodies can 

4 be immunized with the pseudogene polypeptide, Methods for immunization are well 

5 known in the art. Such methods include subcutaneous or interperitoneal injection of 

6 the polypeptide. One skilled in the art will recognize that the amount of the protein 

7 encoded by the ORF of the present invention used for immunization will vary based on 

8 the animal which is immunized, the antigenicity of the peptide and the site of injection. 

9 The protein which is used as an immunogen may be modified or administered 

10 in an adjuvant in order to increase the protein's antigenicity. Methods of increasing the 

1 1 antigenicity of a protein are well known in the art and include, but are not limited to 

12 coupling the antigen with a heterologous protein (such as globulin or galactosidase) or 

13 through the inclusion of an adjuvant during immunization. 

14 For monoclonal antibodies, spleen cells from the immunized animals ate 

15 removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and allowed 

16 to become monoclonal antibody producing hybridoma cells. 

17 Any one of a number of methods well known in the art can be used to identify 

18 the hybridoma cell which produces an antibody with the desired characteristics. These 

19 include screening the hybridomas with an ELISA assay, western blot analysis, or 

20 radioimmunoassay (Lutz et al, Exp. Cell Res. 175: 109-124 (1988)). 

21 Hybridomas secreting the desired antibodies are cloned and the class and 

22 subclass is determined using procedures known in the art (Campbell, A. M., 

23 Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and 

24 Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands 

25 (1984)). 

26 Techniques described for the production of single chain antibodies (U, S. 

27 Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of the 

28 present invention. 

29 For polyclonal antibodies, antibody containing antisera is isolated from the 

30 immunized animal and is screened for the presence of antibodies with the desired 

31 specificity .using one of the above-described procedures. 
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1 The present invention further provides the above- described antibodies in 

2 detectably labelled form. Antibodies can be detectably labelled through the use of 

3 radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as 

4 horseradish' peroxidase, alkaline phosphatase, etc) fluorescent labels (such as FTTC or 

5 rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such 

6 labelling are well-known in the art, for example see Sternberger et al, J. Histochem. 

7 Cytochem. 18:315 (1970); Bayer, E. A. etaL, Meth. Enzym. 62:308 (1979); Engval, 

8 E. et aL, Immunol 109:129 (1972); Goding, J, W. J. Immunol Meth/ 13:215 

9 (1976)). 

10 The labeled antibodies of the present invention can be used for in vitro, in vivo, 

1 1 and in situ assays to identify cells or tissues in which a fragment of the Staphylococcus 

12 aureus genome is expressed. 

13 The present invention further provides the above-described antibodies 

14 immobilized on a solid support. Examples of such solid supports include plastics such 

15 as polycarbonate, complex carbohydrates such as agarose and sepharose, acrylic resins 

16 and such as polyacrylamide and latex beads. Techniques for coupling antibodies to 

17 such solid supports are well known in the art (Weir, D. M. et ah, "Handbook of 

18 Experimental Immunology" 4th Ed,» Blackwell Scientific Publications, Oxford, 

19 England, Chapter 10 (1986); Jacoby, W. D. et aL, Meth. Enzym. 34 Academic Press, 

20 N. Y. (1974)). The immobilized antibodies of the present invention can be used for in 

21 vitro, in vivo, and in situ assays as well as for immunoaffinity purification of the 

22 proteins of the present invention. 
23 

24 3* Diagnostic Assays and Kits 

25 The present invention further provides methods to identify the expression of 

26 one of the ORFs of the present invention, or homolog thereof in a test sample, using 

27 one of the DFs,antigens or antibodies of the present invention. 

28 In detail, such methods comprise incubating a test sample with one or more of 

29 the antibodies, or one or more of the DFs, or one or more antigens of the present 

30 invention and assaying for binding of the DFs, antigens or antibodies to components 

3 1 within the test sample. 
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1 Conditions for incubating a DF, antigen or antibody with a test sample vary, 

2 Incubation conditions depend on the format employed in the assay, the detection 

3 methods employed, and the type and nature of the DF or antibody used in the assay. 

4 One skilled in the art will recognize that any one of the commonly available 

5 hybridization, amplification or immunological assay formats can readily be adapted to 

6 employ the Dfs, antigens or antibodies of the present invention. Examples of such 

7 assays can be found in Chard, T., An Introduction to Radioimmunoassay and Related 

8 Techniques, Elsevier Science Publishers, Amsterdam, The Netherlands (1986); 

9 Bullock, G. R. et a/., Techniques in Immunocytochemistry, Academic Press, 

10 Orlando, FL Vol. 1 (1982), Vol 2 (1983), Vol. 3 (1985); Tijssen, P., Practice and 

11 Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry; PCT 

12 publication W095/32291, and Molecular Biology, Elsevier Science Publishers, 

13 Amsterdam, The Netherlands (1985), all of which are hereby incorporated herein by 

14 reference. 

15 The test samples of the present invention include cells, protein or membrane 

16 extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or urine, 

17 The test sample used in the above-described method will vary based on the assay 

18 ' format, nature of the detection method and the tissues, cells or extracts used as the 

19 sample to be assayed. Methods for preparing protein extracts or membrane extracts of 

20 cells are well known in the art and can be readily be adapted in order to obtain a sample 

21 which is compatible with the system utilized. 

22 In another embodiment of the present invention, kits are provided which 

23 contain the necessary reagents to carry out the assays of the present invention. 

24 Specifically, the invention provides a compartmentalized kit to receive, in close 

25 confinement, one or more containers which comprises:(a) a first container comprising 

26 one of the Dfs, antigens or antibodies of the present invention; and (b) one or more 

27 other containers comprising one or more of the followingiwash reagents, reagents 

28 capable of detecting presence of a bound DF, antigen or antibody. 

29 In detail, a compartmentalized kit includes any kit in which reagents are 

30 contained in separate containers. Such containers include small glass containers, 

31 plastic containers or strips of plastic or paper. Such containers allows one to 

32 efficiently transfer reagents from one compartment to another compartment such that 
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1 the samples and reagents are not cross-contaminated, and the agents or solutions of 

2 each container can be added in a quantitative fashion from one compartment to another. 

3 Such containers will include a container which will accept the test sample, a container 

4 which contains the antibodies used in the assay, containers which contain wash 

5 reagents (such as phosphate buffered saline, Tris-buffers, etc.), and containers which 

6 contain the reagents used to detect the bound antibody, antigen or DF, 

7 Types of detection reagents include labelled nucleic acid probes, labelled 

8 secondary antibodies, or in the alternative, if the primary antibody is labelled, the 

9 enzymatic, or antibody binding reagents which are capable of reacting with the labelled 

10 antibody. One skilled in the art will readily recognize that the disclosed Dfs, antigens 

11 and antibodies of the present invention can be readily incorporated into one of the 

12 established kit formats which are well known in the art, 

14 4. Screening Assay for Binding Agents 

15 Using the isolated proteins of the present invention, the present invention 

16 further provides methods of obtaining and identifying agents which bind to a protein 

17 encoded by one of the ORFs of the present invention or to one of the fragments and the 

18 Staphylococcus aureus fragment and contigs herein described. 

19 In general, such methods comprise steps of: 

20 (a) contacting an agent with an isolated protein encoded by one of the ORFs of 

21 the present invention, or an isolated fragment of the Staphylococcus aureus genome; 

22 and 

23 (b) determining whether the agent binds to said protein or said fragment. 

24 The agents screened in the above assay can be, but are not limited to, peptides, 

25 carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be 

26 selected and screened at random or rationally selected or designed using protein 

27 modeling techniques, 

28 For random screening, agents such as peptides, carbohydrates, pharmaceutical 

29 agents and the like are selected at random and are assayed for their ability to bind to the 

30 protein encoded by the ORF of the present invention. 

3 1 Alternatively, agents may be rationally selected or designed. As used herein, an 

32 agent is said to be "rationally selected or designed" when the agent is chosen based on 
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1 the configuration of the particular protein. For example, one skilled in the art can 

2 readily adapt currently available procedures to generate peptides, pharmaceutical agents 

3 and the like capable of binding to a specific peptide sequence in order to generate 

4 rationally designed antipeptide peptides, for example see Hurby et aL, Application of 

5 Synthetic Peptides: Antisense Peptides," In Synthetic Peptides, A User's Guide, W. 

6 H. Freeman, NY (1992), pp. 289-307, and Kaspczak al % Biochemistry 28:9230-8 

7 (1989), or pharmaceutical agents, or the like, 

8 In addition to the foregoing, one class of agents of the present invention, as 

9 broadly described, can be used to control gene expression through binding to one of 

10 the ORFs or EMFs of the present invention, As described above, such agents can be 
i l randomly screened or rationally designed/selected. Targeting the ORF or EMF allows 

12 a skilled artisan to design sequence specific or element specific agents, modulating the 

13 expression of either a single ORF or multiple ORFs which rely on the same EMF for 

14 expression control. 

15 One class of DNA binding agents are agents which contain base residues which 

16 hybridize or form a triple helix by binding to DNA or RNA. Such agents can be based 

17 on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of 

1 8 sulfhy dryl or polymeric derivatives which have base attachment capacity. 

19 Agents suitable for use in these methods usually contain 20 to 40 bases and are 

20 designed to be complementary to a region of the gene involved in transcription (triple 

21 helix - see Lee et al> Nucl. Acids Res. 6:3073 (1979); Cooney et aL, Science 241:456 

22 (1988); and Dervan etal., Science 251: 1360 (1991)) or to the mRNA itself (antisense 

23 - Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as Antisense 

24 Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple helix- 

25 formation optimally results in a shut-off of RNA transcription from DNA, while 

26 antisense RNA hybridization blocks translation of an mRNA molecule into 

27 polypeptide. Both techniques have been demonstrated to be effective in model 

28 systems. Information contained in the sequences of the present invention can be used 

29 to design antisense and triple helix-forming oligonucleotides, and other DNA binding 

30 agents. 
31 

32 5. Pharmaceutical Compositions and Vaccines 
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1 The present invention further provides pharmaceutical agents which can be 

2 used to modulate the growth or pathogenicity of Staphylococcus aureus, or another 

3 related organism, in vivo or in vitro. As used herein, a "pharmaceutical agent" is 

4 defined as a composition of matter which can be formulated using known techniques to 

5 provide a pharmaceutical compositions. As used herein, the "pharmaceutical agents of 

6 the present invention" refers the pharmaceutical agents which are derived from the 

7 proteins encoded by the ORFs of the present invention or are agents which are 

8 identified using the herein described assays. 

9 As used herein, a pharmaceutical agent is said to "modulate the growth or 

10 pathogenicity of Staphylococcus aureus or a related organism, in vivo or in vitro" 

1 1 when the agent reduces the rate of growth, rate of division, or viability of the organism 

12 in question. The pharmaceutical agents of the present invention can modulate the 

13 growth or pathogenicity of an organism in many fashions, although an understanding 

14 of the underlying mechanism of action is not needed to practice the use of the 

15 pharmaceutical agents of the present invention. Some agents will modulate the growth 

16 or pathogenicity by binding to an important protein thus blocking the biological activity 

17 of the protein, while other agents may bind to a component of the outer surface of the 

18 organism blocking attachment or rendering the organism more prone to act the bodies 

19 nature immune system. Alternatively, the agent may comprise a protein encoded by 

20 one of the ORFs of the present invention and serve as a vaccine. The development and 

21 use of vaccines derived from membrane associated polypeptides are well known in the 

22 art. The inventors have identified particularly preferred immunogenic Staphylococcus 

23 aureus polypeptides for use as vaccines. Such immunogenic polypeptides are 

24 described above and summarized in Table 4, below. 

25 As used herein, a "related organism" is a broad term which refers to any 

26 organism whose growth or pathogenicity can be modulated by one of the 

27 pharmaceutical agents of the present invention. In general, such an organism will 

28 contain a homolog of the protein which is the target of the pharmaceutical agent or the 

29 protein used as a vaccine. As such, related organisms do not need to be bacterial but 

30 may be fungal or viral pathogens. 

31 The pharmaceutical agents and compositions of the present invention may be 

32 administered in a convenient manner, such as by the oral, topical, intravenous, 
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1 intraperitoneal, intramuscular, subcutaneous, intranasal or intradermal routes. The 

2 pharmaceutical compositions are administered in an amount which is effective for 

3 treating and/or prophylaxis of the specific indication. In general, they are administered 

4 in an amount of at least about 1 mg/kg body weight and in most cases they will be 

5 administered in an amount not in excess of about 1 g/kg body weight per day. In most 

6 cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking 

7 into account the routes of administration, symptoms, etc. 

8 The agents of the present invention can be used in native form or can be 

9 modified to form a chemical derivative. As used herein, a molecule is said to be a 

10 "chemical derivative" of another molecule when it contains additional chemical moieties 

11 not normally a part of the molecule. Such moieties may improve the molecule's 

12 solubility, absorption, biological half life, etc. The moieties may alternatively decrease 

13 the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the 

14 molecule, etc. Moieties capable of mediating such effects are disclosed in, among 

15 other sources, REMINGTON'S PHARMACEUTICAL SCIENCES (1980) cited 

16 elsewhere herein. 

17 For example, such moieties may change an immunological character of the 

18 functional derivative, such as affinity for a given antibody. Such changes in 

19 immuno'modulation activity are measured by the appropriate assay, such as a 

20 competitive type immunoassay. Modifications of such protein properties as redox or 

21 thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic 

22 degradation or the tendency to aggregate with carriers or into multimers also may be 

23 effected in this way and can be assayed by methods well known to the skilled artisan. 

24 The therapeutic effects of the agents of the present invention may be obtained 

25 by providing the agent to a patient by any suitable means (e.g., inhalation, 

26 intravenously, intramuscularly, subcutaneously, enterally, or parenterally). It is 

27 preferred to administer the agent of the present invention so as to achieve an effective 

28 concentration within the blood or tissue in which the growth of the organism is to be 

29 controlled. To achieve an effective blood concentration, the preferred method is to 

30 administer the agent by injection. The administration may be by continuous infusion, 

31 or by single or multiple injections. 
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1 In providing a patient with one of the agents of the present invention, the 

2 dosage of the administered agent will vary depending upon such factors as the patient's 

3 age, weight, height, sex, general medical condition, previous medical history, etc. In 

4 general, it is desirable to provide the recipient with a dosage of agent which is in the 

5 range of from about 1 pg/kg to 10 mg/kg (body weight of patient), although a lower or 

6 higher dosage may be administered. The therapeutically effective dose can be lowered 

7 by using combinations of the agents of the present invention or another agent. 

8 As used herein, two or more compounds or agents are said to be administered 

9 "in combination" with each other when either (1) the physiological effects of each 

10 compound, or (2) the serum concentrations of each compound can be measured at the 

11 same time. The composition of the present invention can be administered concurrently 

12 with, prior to, or following the administration of the other agent, 

13 The agents of the present invention are intended to be provided to recipient 

14 subjects in an amount sufficient to decrease the rate of growth (as defined above) of the 

15 target organism, 

16 The administration of the agent (s) of the invention may be for either a 

17 "prophylactic" or "therapeutic" purpose. When provided prophylactically, the agent(s) 

18 are provided in advance of any symptoms indicative of the organisms growth. The 

19 prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the 

20 rate of onset of any subsequent infection. When provided therapeutically, the agent(s) 

21 are provided at (or shortly after) the onset of an indication of infection. The therapeutic 

22 administration of the compound(s) serves to attenuate the pathological symptoms of the 

23 infection and to increase the rate of recovery. 

24 The agents of the present invention are administered to a subject, such as a 

25 mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically 

26 effective concentration. A composition is said to be "pharmacologically acceptable" if 

27 its administration can be tolerated by a recipient patient. Such an agent is said to be 

28 administered in a "therapeutically effective amount" if the amount administered is 

29 physiologically significant. An agent is physiologically significant if its presence 

30 results in a detectable change in the physiology of a recipient patient, 

31 The agents of the present invention can be formulated according to known 

32 methods to prepare pharmaceutically useful compositions, whereby these materials, or 
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1 their functional derivatives, are combined in admixture with a pharmaceutical!;/ 

2 acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other 

3 human proteins, e.g., human serum albumin, are described, for example, in 

4 REMINGTON'S PHARMACEUTICAL SCIENCES, 16 th Ed., Osol, A„ Ed., Mack 

5 Publishing, Easton PA (1980). In order to form a pharmaceutical acceptable 

6 composition suitable for effective administration, such compositions will contain an 

7 effective amount of one or more of the agents of the present invention, together with a 

8 suitable amount of carrier vehicle* 

9 Additional pharmaceutical methods may be employed to control the duration of 

10 action. Control release preparations may be achieved through the use of polymers to 

1 1 complex or absorb one or more of the agents of the present invention. The controlled 

12 delivery may be effectuated by a variety of well known techniques, including 

13 formulation with macromolecules such as, for example, polyesters, polyamino acids, 

14 polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose, 

15 or protamine, sulfate, adjusting the concentration of the macromolecules and the agent 

16 in the formulation, and by appropriate use of methods of incorporation, which can be 

17 manipulated to effectuate a desired time course of release. Another possible method to 

18 control the duration of action by controlled release preparations is to incorporate agents 

19 of the present invention into particles of a polymeric material such as polyesters, 

20 polyamino acids, hydrogels, poly(lactic acid) or ethylene vinylacetate copolymers, 

21 Alternatively, instead of incorporating these agents into polymeric particles, it is 

22 possible to entrap these materials in microcapsules prepared, for example, by 

23 coacervation techniques or by interfacial polymerization with, for example, 

24 hydroxymethylcellulose or gelatine-microcapsules and poly(methylmethacylate) 

25 microcapsules, respectively, or in colloidal drug delivery systems, for example, 

26 liposomes, albumin microspheres, microemulsions, nanoparticles, and nanocapsules 

27 or in macroemulsions. Such techniques are disclosed in REMINGTON'S 

28 PHARMACEUTICAL SCIENCES (1980). 

29 The invention further provides a pharmaceutical pack or kit comprising one or 

30 more containers filled with one or more of the ingredients of the pharmaceutical 

31 compositions of the invention. Associated with such container(s) can be a notice in the 

32 form prescribed by a governmental agency regulating the manufacture, use or sale of 
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1 pharmaceuticals or biological products, which notice reflects approval by the agency of 

2 manufacture, use or sale for human administration. 

3 In addition, the agents of the present invention may be employed in conjunction 

4 with other therapeutic compounds. 

5 

6 6, Shot-Gun Approach to Megabase DNA Sequencing 

7 The present invention further demonstrates that a large sequence can be 

8 sequenced using a random shotgun approach. This procedure, described in detail in 

9 the examples that follow, has eliminated the up front cost of isolating and ordering 
10 overlapping or contiguous subclones prior to the start of the sequencing protocols. 

n 

12 Certain aspects of the present invention are described in greater detail in the 

13 examples that follow. The examples are provided by way of illustration. Other 

14 aspects and embodiments of the present invention are contemplated by the inventors, 

15 as will be clear to those of skill in the art from reading the present disclosure. 
16 

17 ILL USTRATIVE EXAMPLES 

18 

19 LIBRARIES AND SEQUENCING 

20 1. Shotgun Sequencing Probability Analysis 

21 The overall strategy for a shotgun approach to whole genome sequencing 

22 follows from the Lander and Waterman (Landerman and Waterman, Genomics 2: 23 1 

23 (1988)) application of the equation for the Poisson distribution. According to this 

24 treatment, the probability, P 0 , that any given base in a sequence of size L, in 

25 nucleotides, is not sequenced after a certain amount, n, in nucleotides, of random 

26 sequence has been determined can be calculated by the equation P 0 = e' m , where m is 

27 L/n, the fold coverage." For instance, for a genome of 2.8 Mb, m=l when 2.8 Mb of 

28 sequence has been randomly generated (IX coverage). At that point, P 0 = e" 1 = 0.37. 

29 The probability that any given base has not been sequenced is the same as the 

30 probability that any region of the whole sequence L has not been determined and, 

31 therefore, is equivilent to the fraction of the whole sequence that has yet to be 

32 determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of 
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1 size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been 

2 generated, coverage is 5X for a .2.8 Mb and the unsequenced fraction drops to .0067 

3 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing 

4 approximately 17,000 random clones from both insert ends with an average sequence 

5 read length of 4 10 bp. 

6 Similarly, the total gap length, G, is determined by the equation G = Le m , and 

7 the average gap size, g, follows the equation, g = L/n. Thus, 5X coverage leaves 

8 about 240 gaps averaging about 82 bp in size in a sequence of a polynucleotide 2.8 Mb 

9 long. 

10 The treatment above is essentially that of Lander and Waterman, Genomics 2: 
n 231 (1988). 

12 

13 2* Random Library Construction 

14 In order to approximate the random model described above during actual 

15 sequencing, a nearly ideal library of cloned genomic fragments is required. The 

16 following library construction procedure was developed to achieve this end. 

17 Staphylococcus aureus DNA was prepared by phenol extraction. A mixture 

18 containing 600 ug DNA in 3.3 ml of 300 mM sodium acetate, 10 mM Tris-HCl, 1 mM 

19 Na-EDTA, 30% glycerol was sonicated for 1 min. at 0°C in a Branson Model 450 

20 Sonicator at the lowest energy setting using a 3 mm probe. The sonicated DNA was 

21 ethanol precipitated and redissolved in 500 ul TE buffer. 

22 To create blunt-ends, a 100 ul aliquot of the resuspended DNA was digested 

23 with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 30°C in 200 ul 

24 BAL31 buffer . The digested DNA was phenol-extracted, ethanol-precipitated, 

25 redissolved in 100 ul TE buffer, and then size-fractionated by electrophoresis through 

26 a 1.0% low melting temperature agarose gel. The section containing DNA fragments 

27 1.6-2.0 kb in size was excised from the gel, and the LGT agarose was melted and the 

28 resulting solution was extracted with phenol to separate the agarose from the DNA. 

29 DNA was ethanol precipitated and redissolved in 20 ul of TE buffer for ligation to 

30 vector. 

31 A two-step ligation procedure was used to produce a plasmid library with 97% 

32 inserts, of which >99% were single inserts. The first ligation mixture (50 ul) 
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1 contained 2 ug of DNA fragments, 2 ug pUC18 DNA (Pharmacia) cut with Smal and 

2 dephosphoiylated with bacterial alkaline phosphatase, and 10 units of T4 ligase 

3 (GIBCO/BRL) and was incubated at 14°C for 4 hr. The ligation mixture then was 

4 phenol extracted and ethanol precipitated, and the precipitated DNA was dissolved in 

5 20. ul TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete 

6 bands in a ladder were visualized by ethidium bromide-staining and UV illumination 

7 and identified by size as insert (i), vector (v), v+i, v-f2i, v+3i, etc. The portion of the 

8 gel containing v+i DNA was excised and the v+i DNA was recovered and 

9 resuspended into 20 ul TE. The v+i DNA then was blunt-ended by T4 polymerase 

10 treatment for 5 min. at 37° C in a reaction mixture (50 ul) containing the v+i linears, 

1 1 500 uM each of the 4 dNTPs, and 9 units of T4 polymerase (New England BioLabs), 

12 under recommended buffer conditions. After phenol extraction and ethanol 

13 precipitation the repaired v+i linears were dissolved in 20 ul TE. The final ligation to 

14 produce circles was carried out in a 50 ul reaction containing 5 ul of v+i linears and 5 

15 units of T4 ligase at 14°C overnight After 10 min. at 70°C the following day, the 

16 reaction mixture was stored at -20°C 

17 This two-stage procedure resulted in a molecularly random collection of single- 

18 insert plasmid recombinants with minimal contamination from double-insert chimeras 

19 (<1%) or free vector (<3%). 

20 Since deviation from randomness can arise from propagation the DNA in the 

21 host, Kcoli host cells deficient in all recombination and restriction functions (A. 

22 Greener, Strategies 3 (1):5 (1990)) were used to prevent rearrangements, deletions, 

23 and loss of clones by restriction* Furthermore, transformed cells were plated directly 

24 on antibiotic diffusion plates to avoid the usual broth recovery phase which allows 

25 multiplication and selection of the most rapidly growing cells. 

26 Plating was carried out as follows. A 100 ul aliquot of Epicurian Coli SURE II 

27 Supercompetent Cells (Stratagene 200152) was thawed on ice and transferred to a 

28 chilled Falcon 2059 tube on ice. A 1.7 ul aliquot of 1.42 M beta-mercaptoethanol was 

29 added to the aliquot of cells to a final concentration of 25 mM. Cells were incubated 

30 on ice for 10 min. A 1 ul aliquot of the final ligation was added to the cells and 

31 incubated on ice for 30 min. The cells were heat pulsed for 30 sec. at 42° C and 

32 placed back on ice for 2 min. The outgrowth period in liquid culture was eliminated 



53 



21944] 1 



1 from this protocol in order to minimize the preferential growth of any given 

2 transformed cell. Instead the transformation mixture was plated directly on a nutrient 

3 rich SOB plate containing a 5 ml bottom layer of SOB agar (5% SOB agar: 20 g 

4 tryptone, 5 g yeast extract, 0.5 g NaCl, 1 .5% Difco Agar per liter of media). The 5 ml 

5 bottom layer is supplemented with 0.4 ml of 50 mg/ml ampicillin per 100 ml SOB 

6 agar. The 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal (2%), 1 ml 

7 MgCl 2 (1 M), and 1 ml MgSO 4 /100 ml SOB agar. The 15 ml top layer was poured 

8 just prior to plating. Our titer was approximately 100 colonies/10 ul aliquot of 

9 transformation. 

10 All colonies were picked for template preparation regardless of size. Thus, 
Li only clones lost due to "poison" DNA or deleterious gene products would be deleted 
12 from the library, resulting in a slight increase in gap number over that expected. 

13 

14 3, Random DNA Sequencing 

15 High quality double stranded DNA plasmid templates were prepared using an 

16 alkaline lysis method developed in collaboration with SPrime - — > 3Prime Inc. 

17 (Boulder, CO), Plasmid preparation was performed in a 96-well format for all stages 

18 of DNA preparation from bacterial growth through final DNA purification. Average 

19 template concentration was determined by running 25% of the samples on an agarose 

20 gel. DNA concentrations were not adjusted. 

21 Templates were also prepared irom a Staphylococcus aureus lambda genomic 

22 library. An unamplified library was constructed in Lambda DASH II vector 

23 (Stratagene), Staphylococcus aureus DNA (> 100 kb) was partially digested in a 

24 reaction mixture (200 ul) containing 50 ug DNA, IX Sau3AI buffer, 20 units Sau3AI 

25 for 6 min. at 23 C. The digested DNA was phenol-extracted and centrifuges over a 

26 10- 40% sucroce gradient. Fractions containing genomic DNA of 15-25 kb were 

27 recovered by precipitation . One ul of fragments was used with 1 ul of DASHII 

28 vector (Stratagene) in the recommended ligation reaction. One ul of the ligation 

29 mixture was used per packaging reaction following the recommended protocol with the 

30 Gigapack II XL Packaging Extract Phage were plated directly without amplification 

31 from the packaging mixture (after dilution with 500 ul of recommended SM buffer and 

32 chloroform treatment). Yield was about 2,5xl0 9 pfu/ul. 
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1 An amplified library was prepared from the primary packaging mixture 

2 according to the manufacturer's protocol. The amplified library is stored frozen in 

3 7% dimethylsulfoxide. The phage titer is approximately lxlO 9 pfu/ml. 

4 Mini-liquid lysates (0,1 ul) are prepared from randomly selected plaques and 

5 template is prepared by long range PCR. Samples are PCR amplified using modified 

6 T3 and T7 primers, and Elongase Supermix (LTI). 

7 Sequencing reactions are carried out on plasmid templates using a combination 

8 of two workstations (BIOMEK 1000 and Hamilton Microlab 2200) and the Perkin- 

9 Elmer 9600 thermocycler with Applied Biosystems PRISM Ready Reaction Dye 

10 Primer Cycle Sequencing Kits for the MB forward (Ml 3-21) and the M 13 reverse 

11 (M13RP1) primers. Dye terminator sequencing reactions are carried out on the lambda 

12 templates on a Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready 

13 Reaction Dye Terminator Cycle Sequencing kits. Modified T7 and T3 primers are 

14 used to sequence the ends of the inserts from the Lambda DASH II library. 

15 Sequencing reactions are on a combination of AB 373 DNA Sequencers and ABI 377 

16 DNA sequencers. All of the dye terminator sequencing reactions are analyzed using 

17 the 2X 9 hour module on the AB 377. Dye primer reactions are analyzed on a 

18 combination of ABI 373 and ABI 377 DNA sequencers. The overall sequencing 

19 success rate very approximately is about 85% for M13-21 and M13RP1 sequences and 

20 65% for dye-terminator reactions. The average usable read length is 485 bp for M13- 

21 21 sequences, 445bp for M13RP1 sequences, and 375 bp for dye-terminator 

22 reactions. 
23 

24 4. Protocol for Automated Cycle Sequencing 

25 The sequencing was carried out using Hamilton Microstation 2200, Perkin 

26 Elmer 9600 thermocyclers, ABI 373 and ABI 377 Automated DNA Sequencers. The 

27 Hamilton combines pre-aliquoted templates and reaction mixes consisting of deoxy- 

28 and dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-labelled 

29 sequencing primers, and reaction buffer. Reaction mixes and templates were 

30 combined in the wells of a 96-well thermocycling plate and transferred to the Perkin 

31 Elmer 9600 thermocycler. Thirty consecutive cycles of linear amplification (i.e.., one 

32 primer synthesis) steps were performed including denaturation, annealing of primer 
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{ and template, and extension; Le,, DNA synthesis. A heated lid with rubber gaskets on 

2 the thermocycling plate prevents evaporation without the need for an oil overlay. 

3 Two sequencing protocols were used: one for dye-labelled primers and a 

4 second for dye-labelled dideoxy chain terminators. The shotgun sequencing involves 

5 use of four dye-labelled sequencing primers, one for each of the four terminator 

6 nucleotide. Each dye-primer was labelled with a different fluorescent dye, permitting 

7 the four individual reactions to be combined into one lane of the 373 or 377 DNA 

8 Sequencer for electrophoresis, detection, and base-calling. ABI currently supplies pre- 

9 mixed reaction mixes in bulk packages containing all the necessary non-template 

10 reagents for sequencing. Sequencing can be done with both plasmid and PCR- 

11 generated templates with both dye-primers and dye- terminators with approximately 

12 equal fidelity, although plasmid templates generally give longer usable sequences, 

13 Thirty-two reactions were loaded per ABI 373 Sequencer each day and 96 

14 samples can be loaded on an ABI 377 per day. Electrophoresis was run overnight 

15 (ABI 373) or for 2 1/2 hours (ABI 377) following the manufacturer's protocols. 

16 Following electrophoresis and fluorescence detection, the ABI 373 or ABI 377 

17 performs automatic lane tracking and base-calling. The lane-tracking was confirmed 

18 visually. Each sequence electropherogram (or fluorescence lane trace) was inspected 

19 visually and assessed for quality. Trailing sequences of low quality were removed and 

20 the sequence itself was loaded via software to a Sybase database (archived daily to 

21 8mm tape). Leading vector polylinker sequence was removed automatically by a 

22 software program. Average edited lengths of sequences from the standard ABI 373 or 

23 ABI 377 were around 400 bp and depend mostly on the quality of the template used 

24 for the sequencing reaction. 
25 

26 INFORMATICS 

27 1, Data Management 

28 A number of information management systems for a large-scale sequencing lab 

29 have been developed. (For review see, for instance, Kerlavage et al, Proceedings of 

30 the Twenty-Sixth Annual Hawaii International Conference on System Sciences, IEEE 

31 Computer Society Press, Washington D. C, 585 (1993)) The system used to collect 

32 and assemble the sequence data was developed using the Sybase relational database 
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1 management system and was designed to automate data flow whereever possible and 

2 to reduce user error. The database stores and correlates all information collected 

3 during the entire operation from template preparation to final analysis of the genome, 

4 Because the raw output of the ABI 373 Sequencers was based on a Macintosh platform 

5 and the data management system chosen was based on a Unix platform, it was 

6 necessary to design and implement a variety of multi- user t client-server applications 

7 which allow the raw data as well as analysis results to flow seamlessly into the 

8 database with a minimum of user effort. 
9 

io 2. Assembly 

n An assembly engine (TIGR Assembler) developed for the rapid and accurate 

12 assembly of thousands of sequence fragments was enployed to generate contigs. The 

13 TIGR assembler simultaneously clusters and assembles fragments of the genome. In 

14 order to obtain the speed necessary to assemble more than 10 4 fragments, the algorithm 

15 builds a hash table of 12 bp oligonucleotide subsequences to generate a list of potential 

16 sequence fragment overlaps. The number of potential overlaps for each fragment 

17 determines which fragments are likely to fall into repetitive elements. Beginning with a 

18 single seed sequence fragment, TIGR Assembler extends the current contig by 

19 attempting to add the best matching fragment based on oligonucleotide content. The 

20 contig and candidate fragment are aligned using a modified version of the Smith- 

21 Waterman algorithm which provides for optimal gapped alignments (Waterman, M. 

22 S., Methods in Enzymology 164: 765 (1988)). The contig is extended by the fragment 

23 only if strict criteria for the quality of the match are met. The match criteria include the 

24 minimum length of overlap, the maximum length of an unmatched end, and the 

25 minimum percentage match. These criteria are automatically lowered by the algorithm 

26 in regions of minimal coverage and raised in regions with a possible repetitive element. 

27 The number of potential overlaps for each fragment determines which fragments are 

28 likely to fall into repetitive elements. Fragments representing the boundaries of 

29 repetitive elements and potentially chimeric fragments are often rejected based on 

30 partial mismatches at the ends of alignments and excluded from the current contig. 

31 TIGR Assembler is designed to take advantage of clone size information coupled with 

32 sequencing from both ends of each template. It enforces the constraint that sequence 
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1 fragments from two ends of the same template point toward one another in the contig 

2 and are located within a certain ranged of base pairs (definable for each clone based on 

3 the known clone size range for a given library). 

4 

5 3* Identifying Genes 

6 The predicted coding regions of the Staphylococcus aureus genome were 

7 initially defined with the program zorf, which finds ORFs of a minimum length. The 

8 predicted coding region sequences were used in searches against a database of all 

9 Staphylococcus aureus nucleotide sequences from GenBank (release 92.0), using the 

10 BLASTN search method to identify overlaps of 50 or more nucleotides with at least a 

11 95% identity. Those ORFs with nucleotide sequence matches are shown in Table 1 . 

12 The ORFs without such matches were translated to protein sequences and and 

13 compared to a non-redundant database of known proteins generated by combining the 

14 Swiss-prot, PIR and GenPept databases. ORFs of at least 80 amino acids that 

15 matched a database protein with BLASTP probability less than or equal to 0,01 are 

16 shown in Table 2. The table also lists assigned functions based on the closest match in 

17 the databases. ORFs of at least 120 amino acids that did not match protein or 

18 nucleotide sequences in the databases at these levels are shown in Table 3. 
19 

20 ILLUSTRATIVE APPLICATIONS 

21 1. Production of an Antibody to a Staphylococcus aureus Protein 

22 Substantially pure protein or polypeptide is isolated from the transfected or 

23 transformed cells using any one of the methods known in the art. The protein can also 

24 be produced in a recombinant prokaryotic expression system, such as E. colU or can 

25 by chemically synthesized. Concentration of protein in the final preparation is adjusted, 

26 for example, by concentration on an Amicon filter device, to the level of a few 

27 micrograms/mL Monoclonal or polyclonal antibody to the protein can then be prepared 

28 as follows. 
29 

30 2, Monoclonal Antibody Production by Hybridoma Fusion 

31 Monoclonal antibody to epitopes of any of the peptides identified and isolated 

32 as described can be prepared from murine hybridomas according to the classical 
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t method of Kohler, G. and Milstein, C, Nature 256:495 (1975) or modifications of the 

2 methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of 

3 the selected protein over a period of a few weeks. The mouse is then sacrificed, and 

4 the antibody producing cells of the spleen isolated. The spleen cells are fused by 

5 means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells 

6 destroyed by growth of the system on selective media comprising aminopterin (HAT 

7 media). The successfully fused cells are diluted and aliquots of the dilution placed in 

8 wells of a microliter plate where growth of the culture is continued. Antibody- 

9 producing clones are identified by detection of antibody in the supernatant fluid of the 

10 wells by immunoassay procedures, such as ELISA, as originally described by 

1 1 Engvall, E., Meth. Enzymol. 70:419 (1980), and modified methods thereof Selected 

12 positive clones can be expanded and their monoclonal antibody product harvested for 

13 use. Detailed procedures for monoclonal antibody production are described in Davis, 

14 L. et al Basic Methods in Molecular Biology Elsevier, New York, Section 21-2 

15 (1989). 
16 

17 3, Polyclonal Antibody Production by Immunization 

18 Polyclonal antiserum containing antibodies to heterogenous epitopes of a single 

19 protein can be prepared by immunizing suitable animals with the expressed protein 

20 described above, which can be unmodified or modified to enhance immunogenicity. 

21 Effective polyclonal antibody production is affected by many factors related both to the 

22 antigen and the host species. For example, small molecules tend to be less 

23 immunogenic than other and may require the use of carriers and adjuvant. Also, host 

24 animals vary in response to site of inoculations and dose, with both inadequate or 

25 excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of 

26 antigenadministered at* multiple intradermal sites appears to be most reliable. An 

27 effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al., J. 

28 Clin. EndocrinoL Metab. 33:988-991 (1971). 

29 Booster injections can be given at regular intervals, and antiserum harvested 

30 when antibody titer thereof, as determined semi-quantitatively, for example, by double 

31 immunodiffusion in agar against known concentrations of the antigen, begins to fall. 

32 See, for example, Ouchterlony, O. et al, Chap. 19 in:Handbook of Experimental 
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1 Immunology, Wier, D M ed, Blackwell (1973). Plateau concentration of antibody is 

2 usually in the range of 0. t to 0, 2 mg/ml of serum (about 12M). Affinity of the 

3 antisera for the antigen is determined by preparing competitive binding curves, as 

4 described, for example, by Fisher, D., Chap, 42 in:Manual of Clinical Immunology, 

5 second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, Washington, 

6 D. C.(1980) 

7 Antibody preparations prepared according to either protocol are useful in 

8 quantitative immunoassays which determine concentrations of antigen-bearing 

9 substances in biological samples; they are also used semi- quantitatively or qualitatively 

10 to identify the presence of antigen in a biological sample. In addition, they are useful 

1 1 in various animal models of Staphylococcal disease known to those of skill in the art as 

12 a means of evaluating the protein used to make the antibody as a potential vaccine 

13 target or as a means of evaluating the antibody as a potential immunothereapeutic 

14 reagent 
15 

16 3. Preparation of PCR Primers and Amplification of DNA 

17 Various fragments of the Staphylococcus aureus genome, such as those of 

18 Tables 1-3 and SEQ ID NOS: 1-5,191 can be used, in accordance with the present 

19 invention, to prepare PCR primers for a variety of uses. The PCR primers are 

20 preferably at least 15 bases, and more preferably at least 18 bases in length. When 

21 selecting a primer sequence, it is preferred that the primer pairs have approximately the 

22 same G/C ratio, so that melting temperatures are approximately the same. The PCR 

23 primers and amplified DNA of this Example find use in the Examples that follow, 
24 

25 4. Gene expression from DNA Sequences Corresponding to 

26 ORFs 

27 A fragment of the Staphylococcus aureus genome provided in Tables 1-3 is 

28 introduced into an expression vector using conventional technology. Techniques to 

29 transfer cloned sequences into expression vectors that direct protein translation in 

30 mammalian, yeast, insect or bacterial expression systems are well known in the art. 

31 Commercially available vectors and expression systems are available from a variety of 

32 suppliers including Stratagene (La Jolla, California), Promega (Madison, Wisconsin)* 
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1 and Invitrogen (San Diego, California). If desired, to enhance expression and facilitate 

2 proper protein folding, the codon context and codon pairing of the sequence may be 

3 optimized for the particular expression organism, as explained by Hatfield et al, y U. 

4 S. Patent No. 5,082,767, incorporated herein by this reference. 

5 The following is provided as one exemplary method to generate polypeptide(s) 

6 from cloned ORFs of the Staphylococcus aureus genome fragment. Bacterial ORFs 

7 generally lack a poly A addition signal. The addition signal sequence can be added to 

8 the construct by, for example, splicing out the poly A addition sequence from pSG5 

9 (Stratagene) using Bgll and Sail restriction endonuclease enzymes and incorporating it 

10 into the mammalian expression vector pXTl (Stratagene) for use in eukaryotic 

11 expression systems. pXTl contains the LTRs and a portion of the gag gene of 

12 Moloney Murine Leukemia Virus. The positions of theLTRs in the construct allow 

13 efficient stable transfection. The vector includes the Herpes Simplex thymidine kinase 

14 promoter and the selectable neomycin gene. The Staphylococcus aureus DNA is 

15 obtained by PCR from the bacterial vector using oligonucleotide primers 

16 complementary to the Staphylococcus aureus DNA and containing restriction 

17 endonuclease sequences for PstI incorporated into the 5' primer and Bgin at the 5' end 

18 of the corresponding Staphylococcus aureus DNA 3' primer, taking care to ensure that 

19 the Staphylococcus aureus DNA is positioned such that its followed with the poly A 

20 addition sequence. The purified fragment obtained from the resulting PCR reaction is 

21 digested with PstI, blunt ended with an exonuclease, digested with Bglll, purified and 

22 ligated to pXTl, now containing a poly A addition sequence and digested BgllL 

23 The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin 

24 (Life Technologies, Inc., Grand Island, New York) under conditions outlined in the 

25 product specification. Positive transfectants are selected after growing the transfected 

26 cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). The protein is preferably 

27 released into the supernatant. However if the protein has membrane binding domains, 

28 the protein may additionally be retained within the cell or expression may be restricted 

29 to the cell surface. Since it may be necessary to purify and locate the transfected 

30 product, synthetic 15-mer peptides synthesized from the predicted Staphylococcus 

31 aureus DNA sequence are injected into mice to generate antibody to the polypeptide 

32 encoded by the Staphylococcus aureus DNA. 
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1 Alternately and if antibody production is not possible, the Staphylococcus 

2 aureus DNA sequence is additionally incorporated into eukaryotic expression vectors 

3 and expressed as, for example, a globin fusion. Antibody to the globin moiety then is 

4 used to purify the chimeric protein. Corresponding protease cleavage sites are 

5 engineered between the globin moiety and the polypeptide encoded by the 

6 Staphylococcus aureus DNA so that the latter may be freed from the formed by simple 

7 protease digestion. One useful expression vector for generating globin chimerics is 

8 pSG5 (Stratagene). This vector encodes a rabbit globin. Intron II of the rabbit globin 

9 gene facilitates splicing of the expressed transcript, and the polyadenylation signal 

10 incorporated into the construct increases the level of expression. These techniques are 

11 well known to those skilled in the art of molecular biology. Standard methods are 

12 published in methods'texts such as Davis et ai, cited elsewhere herein, and many of 

13 the methods are available from the technical assistance representatives from Stratagene, 

14 Life Technologies, Inc., or Promega. Polypeptides of the invention also may be 

15 produced using in vitro translation systems such as in vitro ExpressTM Translation Kit 

16 (Stratagene). 

17 While the present invention has been described in some detail for purposes of 

18 clarity and understanding, one skilled in the art will appreciate that various changes in 

19 form and detail can be made without departing from the true scope of the invention, 

20 All patents, patent applications and publications referred to above are hereby 

21 incorporated by reference. 
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Table 4 



219441 1 



ORF 


SEQ. ID NO 


BLAST ! Antigenic 


Regions 










HOMOLOG Region 1 


Region 2 


Region 3 


Region 4 


168.6 


5192 


lipoprotein ' 36-45 


84-103 


152-161 


176-185 


238.1 


5193 


chrA ! 21-39 


48-58 


84-95 


232-249 


51.2 


5194 


OppB gene product (B. sub; 20-36 


70-79 


100-112 


121-131 


278.3 


5195 


lipoprotein 1 | 20-29 


59-73 


85-97 


162-171 


276.2 


5196 


lipoprotein | 21-33 


65-74 


177-186 


211-220 


45.4 


5197 


ProX 1 28-37 


59-69 


85-100 


120-129 


316.8 


5198 


hypothetical protein 


45-54 


88-97 


182-192 


243-253 


154.15 


5199 


unknown 


31-40 


48-58 


79-88 


95-104 


228.3 


5200 


unknown 


25-38 


40-52 


64-74 


80-89 


228.6 


5201 


unknown 


29-41 


89-101 


128-143 


173-184 


50.1 


5202 


unknown 


21-33 


52-61 


168-182 


197-206 


112_7 


5203 


iron-binding periplasmic 


21-31 


58-67 


92-101 


111-120 


442.1 


5204 


unknown 


30-39 


91-100 


122-137 


182-192 


66.2 


5205 


unknown 


50-59 


1 04-1 1 6 


127-136 


167-182 


304.2 


5206 


Q-binding peripiasmic 


19-28 


48-57 


75-84 


103-1 16 


44.1 


5207 


hypothetical protein 27-36 


86-95 


129-138 


192-201 


161.4 


5208 


SphX 


27-44 


149-161 


166-175 


201-210 


46.5 


5209 


cmpC (permease) 


21-33 


61-70 


83-32 


100-109 


942.1 


5210 


traH [Plasmid pSK41] 


83-92 


109-118 


127-142 




5_4 


5211 


ORF (S. aureus) 


12-22 


87-96 


111-120 


151-160 


2CL4 


5212 


peptidoglycan hydrolase (S 


24-34 


129-138 


141-150 


161-171 


328.2 


5213 


lipoprotein (H. flu) 


81-90 


123-133 


290-299 




520.2 


5214 


fibronectin binding protein 


44-54 


63-79 


81-90 


95-110 


771.1 


5215 


emml gene product (S. pyt 


30-39 


65-82 


96-106 


112-121 


999.1 


5216 


predicted trithorax prat. (D 7-1 6 


120-129 


157-166 




853_1 


5217 


ORF2 1 36 (Marchantia polyr 43-52 


88-97 


102-1 1 1 




287.1 


5218 


psaA homolog j 1 3-22 


28-44 


72-82 


114-124 


288.2 


5219 


cell wail enzyme ' 1 4-23 


89-98 






595.2 


5220 


penicillin binding protein 2b 1 40-49 


59-68 


76-87 


106-1 1 5 


217.5 


5221 


fibronectin/fibrinogen bindii 28-37 


, 40-49 


62-71 


93-11 1 


217_6 


5222 


f ibronectin/f ibrinogen bp ! 10-19 


31-40 


54-62 


73-92 


528_3 


5223 


myosin cross reactive prote 4-1 3 


29-47 


60-73 


90-99 


171.11 


5224 


EF ! 20-31 


91-110 






63-4 


5225 


penicillin binding protein 2b] 1 2-2,1 


59-68 


95-104 




353.2 


5226 


i 46-55 


62-71 






743.1 


5227 


29 kDa protein in fimA regii 23-32 


68-79 


94-103 


175-184 


342.4 


5228 


Twitching motility 1 0-1 9 


48-60 


83-92 


1 1 1-1 21 


69.3 


5229 


arabinogalactan protein 97-106 


132-141 


158-167 


1 80-1 89 


70_6 


5230 


noduiin 36-45 


48-57 


137-160 


179-188 


129.2 


5231 


glycerol diester phosphodie 8-1 7 


41-50 


55-74 


97-106 


58.5 


5232 


PBP (5. aureus) 26-35 


70-79 1 17-126 


152-161 


188.3 


5233 


MHC class H analog (S. aure 72-81 


94-103 ; 115-124 


136-145 


236.6 


5234 


histidine kinase domain (Die 24-33 


52-67 ! ; 81-94 


106-121 


310.8 


5235 


clumping factor (S. aureus) 59-71 


77-86 i 93-102 


118-127 


601.1 


5236 


novel antigen/0RF2 (S* aut 45-54 


91-104 ; 108-117 


186-195 


544.3 


5237 


ORF YJR1 51 c (S. cerevisae^ 76-90 


101-111 131-140 


154-164 


662.1 


5238 


MHC class fl analog (S. aure 22-32 


71-80 89-98 


114-122 


87.7 


5239 


5' nucleotidase precursor 0 29-45 


62-71 ■ 105-114 j 125-137 


120.1 


5240 


B65G qene product (B, sub 1 02-1 1 1 i i 
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2194411 



ORF 


| Antigenic 


Regions 


(cont) j ! 




Region 5 


Region 6 


Region 7 


Region 8 ! Region 9 


Region 10 


168.6 


244-272 


303-315 










238.1 


260-269 


291-301 


308-317 








51„2 


140-152 


1 88-208 


211-220 


256-266 


273-283 




278.3 


198-209 












276.2 


255-268 












45^4 


177-199 


221-230 


234-243 


268-279 


284-293 


304-313 


316.8 














154.15 


148-157 


177-187 


202-211 








228.3 


101-119 


139-154 


166-181 








228.6 














50.1 














1 1 2.7 


136-149 


197-211 


218-229 


253-273 






442.1 


199*210 


247-257 


264-277 


287-309 






66.2 














304.2 


178-187 


250-259 










44.1 














161.4 














46.5 


131-141 


162-176 


206-215 


243-252 


264-273 


■ 285-294 


942.1 














5.4 


189-205 


230-239 


246-264 


301-318 


340-354 


378-387 


20.4 


202-212 


217-234 


260-275 


314-336 


366-373 


380-391 


328.2 














520.2 














771.1 


145-154 












999.1 














853.1 














287_1 


154-164 












288.2 














596.2 


121-130 












217.5 


244-253 


259-268 


288-297 


302-311 






217.6 


144-158 


1 74-1 83 


188-197 


207-216 


226-242 




528.3 








i 




171.11 












63.4 








! 




353.2 














743.1 


197-207 












342.4 














69.3 


195-21 1 






i 




70.6 


206-215 


263-272 


291-301 


331-340 ! 358-371 


390-414 


129.2 


117-127 


141-157 


168-183 


202-211 i 222-231 


261-270 


58.5 


1 84-203 i 


260-269 


275-299 


330-344 ! 372-381 


424-433 


188.3 












236.6 


138-147 


163-172 


187-198 


244-261 268-278 


308-317 


310.8 


131-140 


144-153 


177-186 


190-199 . 204-213 ; 216-227 


601.1 


208-218 






„ i , ■ 


544.3 1 170-179 


184-193 


224-235 


274-287 327-336 I 352-361 


662.1 j 








87.7 1 






----- ■ "■ , i 


1 20.1 1 
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Table 4 . <9l9W// 



ORF 


i Antigenic 


Regions 


(cont) 




i 




Region 1 1 j Region 1 2 


Region 13 


Region 14 


Region 1 5 


Region 1 6 


1 68„6 


i 










238_1 


; 










51_2 












278_3 


1 






! 


276„2 


i 






I 


45_4 


i 








316„8 












154„15 


i 










228„3 














228_6 














50_1 


i 










1 12„7 












442„1 


! 










66_2 


! 
i 










304„2 


l 










44„1 












161„4 












46_5 


306-315 | 











942__1 


i 












393-407 ' 416-426 


456-465 








20^4 


396-405 


410-419 


461-481 








328_2 














520_2 














771„1 














999.1 












853_1 












287„1 


i 






i 


288„2 


! 










596^2 


1 










217.5 












217,6 












528„3 












171.11 












63,4 












353__2 


i 










743_J 


1 










342_4 










69.3 










70,6 


453-471 506-515 








129„2 


296-315 








58,5 










1 88.3 I 








236,6 


358-377 410-423 


428-439 


442-457 


467-476 :480-493 


310„8 


238-251 256-275 


281-290 


296-310 


314-333 3*38-347 


60U1 










544„3 I 








662.1 i 






87,7 j 


i . . - 

; i i 
... J — - 1 _ 


1 20.1 i 
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2194411 



ORF 


| Antigenic 


Regions 


(cont) ! 




Region 1 7 j Region 1 8 


Region 19 


Region 20 : Region 21 


Region 22 


1 68.6 






i 




238.1 








i 

1 




513 








! 




2783 








i 




2763 








! 




45.4 








i 

! 




3163 








1 




154.15 








! 




2283 








i 




2283 












50.1 






i 




1123 


1 ^ ■ 




! 




442.1 












663 














3043 










, — ,, 




44.1 












161.4 














463 














942.1 














5.4 








j 

1 




20.4 












328.2 












520.2 












771.1 








i 
i 




999.1 








1 




853.1 








j 




287.1 


„ 






i 




2883 








i 




5963 












2173 








j 




217.6 












5283 








! 




171.11 










63.4 


i 




\ 




353.2 


, , ,i , 




1 




743.1 










342.4 


i — 1 

L„_™~ 








693 










703 


! 






1 29.2 






1 


583 








! 


188.3 










2363 


...... ., . — ( , , , ., , , 






3103 


357-366 370-379 


429-438 


443-452 478-487 551-560 


601.1 






| 


5443 






i 


6623 






i 


87.7 






1203 




1 -r - ~ 
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2194411 

Table 4 



ORF 




Antigenic 


Regions 


(cont) 








Region 23 


Region 24 


Region 25 


Region 26 


Region 27 


Region 28 


168.6 














238.1 














51.2 















278.3 














276_2 














45.4 














316.8 














154.15 














228.3 














228.6 














. 50.1 














112.7 














442.1 














66.2 














304.2 














44.1 


■ 












161.4 














46.5 














942.1 














5_4 














20.4 














328.2 














520.2 ' 














771.1 












999_1 














853.1 














287.1 














288_2 














596_2 














217.5 














217.6 












528.3 














171.1 V 














63.4 














353.2 














" 743,1 














342.4 














69.3 










70.6 




i 

t ■ 




129.2 1 








58.5 t 


T 

I 






1 i 


1 88.3 ! | 




! 




236.6 j ; 






i 
i 


310„8 


622-632 ; 


670-685 


708-718 


823-836 1 


858-867 


877-886 


601.1 ! ' 










544.3 ! j 






i 


662.1 I i 






! 


87„7 i ; 




i • 


120.1 [ ! 




i i 
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Table 4 



219441 1 



OKF 


Antigenic 


Regions 


(cont) 




Region 29 


Region 30 




1 68.6 








238.1 








5 1 „2 








278_3 








276„2 








45.4 








316.8 









154.15 




^ .„ 




228,3 


. . 






228.6 








50.1 


— . 






1 12.7 








442.1 








66.2 








304.2 








44.1 


. . ; 






161.4 








46.5 









942.1 








5.4 








20.4 








328.2 








520.2 








771.1 








999.1 








853.1 








287.1 








288.2 








596.2 








217„5 








217.6 








528.3 








171.11 








63 w 4 








353.2 








743.1 








342.4 








69.3 








70.6 








129.2 








58.5 








188.3 








236.6 








310.8 








601.1 i 








544.3 ! 






662.1 j ^ 






87.7 ; 






120.1 i 
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Table 4 



219441 



ORF 




! BLAST 


Antigenic 


Regions 








i HOMOLOG ! 


Region 1 


Region 2 


Region 3 


Region 4 


46_1 


5241 


! aldehyde dehydrogenase ; 


8-17 


36-52 


83-96 


112-1Z1 


63„4 


5242 


'glycerol ester hydrolase (P. : 


9-26 


57-73 


93-107 


123-133 


174^6 




5243:ketopantoate hydroxymeth, 


71-80 


203-212 


242-254 


265-274 


206.1 6' 5244 


j ornithine acetyltransferase I 


1-10 


34-43 


54-63 


194-210 


267^1 


5245 


iNaH-antiporter protein (E. f. 


120-129 


332-347 


398-408 




322_1 


5246 


j acriflavin resistance protein 


58-75 J 


153-164 


203-231 


.264-284 


415^2 


5247 


; transport ATP-binding prot< 


108-126 


218-227 


298-308 


315-334 


2143 


5248 


|2-nitropropane dioxygenase' 


123-136 


216-233 


283-292 


297-306 


587.3 


5249 


j clumping factor i 


5-14 


43-54 


59-68 


76-95 




5250 


! signal peptidase ! 


59-68 


72-81 


86-95 


99-108 


54^3 


5251 


Ifibronectin binding protein 1 


23-32 


37-46 


50-59 


89-98 


54_4 


5252 


ifibronectin binding protein 1 


43-52 , 


66-75 


95-104 


..... 1.47-156 


54„5 


S253~ 


ifibronectin binding protein ( 


49-60 


81-90 






54_6 


5254 


ifibronectin binding protein ( 


55-71 


82-97 


139-158 


7l 75-1 86 


328_1 


5255 


: lipoprotein (H. flu) ! 


11-20 


61-70 


96-105 
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Table 4 



219441 I 



ORF ! 


Antigenic 


Regions 


(cont) 






1 Region 5 


Region 6 


Region 7 


Region 8 


Region 9 


Region 10 


46„1 


215-242 


333-352 


376-385 


416-432 


471*487 




63^4 


145-154 


191-202 


212-223 


245-265 


274-283 


291-300 


174_6 














206_16 


239-259 


275-284 










267_1 














322„1 


298-319 


350-359 










415^2 


344-353 


371-380 


395-404 


456-465 


486-495 


^18-527*"""" 


214^3 


318-337 


"365-375 










587^3 


106-115 


142-151 


156-166 


173-182 


186-198 


'204-213" 


685_1 


113-122 


130-145 










54^3 


128-138 


185-194 


217-226 


251-260 


268-277 


295-305 


54_4 


1 75-188 


191-200 


203-212 


220-229 






54„5 














54_6 


220-230 


287-304 


317-326 


344-353 


364-373 


378-387 


328_1 


i 
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Table 4 



2194411 



ORF 




Antigenic 


Regions 


(cont) 






Region 1 1 


Region 12 


Region 13 


Region 1 4 


Region 15 ! Region 17 


46_1 












63„4 


306-315 


319-328 


366-376 


395-420 


453-462 1467-476 


1 74„6 














206.J6 














267_1 














322__1 














415^2 
214„3 


, 539-555 
























587_3 


217-226 


278-287 


318-327 


332-342 


351-360 


377-386 


685^1 














54„3 


' 316-325 


329~345 


355-372 


387-396 


416-425 


43F448 


54_4 














54_5 












54_6 


396-407 


427-436 


514-531 


541-550 


569-578 161 2-622 


3 28 J 










1 



258 



Table 4 



219441 1 



ORF 


Antigenic 


Regions 


(cont) 








Region 1 8 


Region 1 9 


Region 20 


Region 21 


Region 22 


Region 23 


46_1 


■ 1 










63.4 


485-500 


513-525 * 








■ ■ 


174_6 








- - 


— 


206_J6 


i 






- ~ - 

u - ... . ... 


~. . 


267.1 






■~ ■ "■ 


. 


■'■ — — — 


322.1 




i 








— — 


415.2 














214.3 


... ...... ! 










587.3 


396-405 


1426-442 


459-470 


485-494 


505-514 


531-562 


685.1 












54.3 


455-462 


1472-491 


517-536 








54.4 












54_5 


i 










54.6 


639-648 


! 673-681 


703-715 


723-732 


749-760 


772-7881^^ 


328.1 
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Table 4 



219441 1 



ORF 




Antigenic 


Regions 


(cont) 








Region 24 


Region 25 


Region 26 


Region 27 


Region 28 


Region 29 


46_J 














63„4 














174„6 














206_16 














267^1 














322_1 














415„2 












214_3 


i ..: 










587„3 


.567-578 


584-601 


607-840 


844-854 


858-870 


877-886 _ 


685_1 




r 










54„3 












54__4 














54„5 














54„6 


793-802 


811-826 


834-848 


866-876 


893-903 


907-918 


328_1 
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ORF 


j Antigenic 


Regions 


(cont) 


' Reqion 30 


Region 31 




46_1 ! 






63„4 i 






174^6 \ 






206„1 6 ; 






267„1 i 






322„1 ! 






41 5_2 j 






214_3 j 






587_3 


889-911 


927-936 




685_1 1 






54_3 


■ 






54_4 








54„5 1 j 




54_6 


925-944 | 


951-997 




328_1 1 ! 





Figure 1 
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Figure 2 
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